Predicting acute kidney injury onset with a random forest algorithm using electronic medical records of COVID-19 patients: the CRACoV-AKI model

Abstract

Introduction: Acute kidney injury (AKI) is a serious and common complication of SARS-CoV-2 infection. Most risk assessment tools for AKI have been developed in the intensive care unit or in elderly populations. As the COVID-19 pandemic is transitioning into an endemic phase, there is an unmet need for prognostic scores tailored to the population of patients hospitalized for this disease.

Objectives: We aimed to develop a robust predictive model for the occurrence of AKI in hospitalized patients with COVID-19.

Patients and methods: Electronic medical records of all adult inpatients admitted between March 2020 and January 2022 were extracted from the database of a large, tertiary care center with a reference status in Lesser Poland. We screened 5806 patients with SARS-CoV-2 infection confirmed with a polymerase chain reaction test. After excluding individuals with lacking data on serum creatinine levels and those with a mild disease course (<⁠7 days of inpatient care), a total of 4630 records were considered. Data were randomly split into training (n = 3462) and test (n = 1168) sets. A random forest model was tuned with feature engineering based on expert advice and metrics evaluated in nested cross-validation to reduce bias.

Results: Nested cross-validation yielded an area under the curve ranging between 0.793 and 0.807, and an average performance of 0.798. Model explanation techniques from a global perspective suggested that a need for respiratory support, chronic kidney disease, and procalcitonin concentration were among the most important variables in permutation tests.

Conclusions: The CRACoV-AKI model enables AKI risk stratification among hospitalized patients with COVID-19. Machine learning–based tools may thus offer additional decision-making support for specialist providers.

What’s new?

Throughout the COVID-19 pandemic, acute kidney injury (AKI) has emerged as one of the most common complications among hospitalized patients. Although several case-control studies have been conducted for incident AKI, few reports have proposed reliable prognostic measures. Cross-population differences limit the wider generalizability of such tools. To predict the occurrence of AKI in the Polish population of patients with COVID-19, we developed and internally validated a random forest model using information from a large database of a regional reference center. Model features were predefined by experts and derived from medical chart review and laboratory test results. Using nested cross-validation, the model-tuning process was performed, and satisfactory performance was confirmed on a separate test set. Due to concept construct, the CRACoV-AKI model should be a robust tool capable of forecasting AKI. It is electronically accessible at https://kalkulator.covid.su.krakow.pl/kalkulator-ryzyka.

Introduction

The COVID-19 pandemic, instigated by SARS-CoV-2, remains a public health concern as it transitions to an endemic phase throughout 2023/2024. Clinical manifestations can range from pauci- or asymptomatic to severe disease necessitating intensive care.^1,2

Prevalence of acute kidney injury (AKI) ranges between 0.5% up to 80% across various COVID-19 cohorts in the literature.³ A pooled analysis of data evaluated AKI incidence at 10%, with 4% of patients requiring renal replacement therapy (RRT).⁴ AKI is an increasingly common condition that is estimated to occur in approximately 5% to 7% of hospitalized patients (general population), based on a prospective study.⁵ Development of acute renal failure has been consistently associated with prolonged hospital stay and a 6-fold increase in the risk of death. In patients with serum creatinine (SCr) levels above 3 mg/dl, the mortality rate ranges between 40% and 60%.^5,6 However, even relatively small changes in SCr levels can be associated with worse clinical outcomes.⁷ On a socioeconomic level, AKI has been estimated to elevate hospital costs by as much as 24 billion USD in the United States, with an added cost range of 11 000 to 42 000 USD per hospitalization.⁸ Current practice guidelines recommend an array of supportive measures for the treatment of AKI.⁹ However, effective treatment remains limited by untimely diagnosis, when significant injury to the renal tissue has already occurred. At this stage, prophylactic and preconditioning measures are less successful, which emphasizes the importance of early detection.¹⁰

It is recommended to screen inpatients for the risk of AKI. The current definition of AKI is based on measures that preclude early detection, as they reflect kidney damage that has already occurred. Developing predictive models or identifying sensitive biomarkers for AKI is one of the objectives for research and practice that is uniformly endorsed by experts in the field.¹¹ Data from randomized trials suggests that both supportive care and timely RRT are successful in reducing AKI occurrence when performed in high-risk patients.¹² Several risk factors have been identified in previous reports, and models that aid in risk stratification have been proposed. However, it is difficult to integrate these approaches into clinical practice due to concerns over sample heterogeneity and related underperformance among COVID-19 patients.

During the initial waves of the pandemic, the Polish health care system was organized into a network of hospitals dedicated to treating patients with SARS-CoV-2 infection. The University Hospital in Kraków was designated as a priority center dedicated to the care for COVID-19 patients, covering a region with over 3 million residents.¹³ All departments apart from a surgery ward as well as internal diseases and neurology units were only admitting SARS-CoV-2 patients.

Special pathways to the COVID-19 departments, radiology rooms, and operating theaters for SARS-CoV-2–positive patients were also paved to avoid hospital-acquired infection (to protect medical staff and non–SARS-CoV-2 patients). A dedicated protocol of basic laboratory analysis was developed early on in our hospital for patients with SARS-CoV-2 infection.

Using data gathered from this large inpatient cohort within the CRACoV in CoVid pandemic – Home, Hospital, and Staff (CRACoV-HHS) project, this study aimed to develop practical risk prediction models for prolonged hospitalization and AKI as tools to assist specialist providers in the clinical decision-making process throughout the COVID-19 pandemic.

Patients and methods

Data source and sampling

We extracted data from electronic medical records (EMRs) of 5806 patients treated in a tertiary care hospital that was a reference center designated for the region of Lesser Poland (ca. 3.4 million inhabitants) during the initial stages of the COVID-19 pandemic. Duplicates were removed by patient ID screening to reduce the potential bias associated with multiple patient encounters with health care. EMRs are continuously updated by medical professionals during all inpatient and outpatient visits. All biochemical and diagnostic investigations are recorded electronically. Such records can be considered reliable, as medical documentation is mandated by law in Poland and ensures adequate coverage of health care claims.

We recruited all patients aged 18 years or older who had a documented history of a positive SARS-CoV-2 polymerase chain reaction test result and were admitted to the hospital between March 2020 and January 2022. The cohort was curtailed to the patients who were hospitalized for at least 7 days, in order to differentiate a more homogenous sample. This decision was based on the difference in data availability for an array of biochemical and diagnostic tests across the patients who presented to the emergency department, were subsequently admitted, and experienced an extended length of stay. This population is designated as the target population of interest due to an increased risk for AKI occurrence. After excluding individuals with lacking data on SCr levels and those with a mild disease course (<⁠7 days of inpatient care), a total of 4630 patients were enrolled. They were randomly split into training (n = 3462) and test (n = 1168) cohorts.

Procedures and patient care

A dedicated protocol of basic laboratory analysis was developed early on in our hospital throughout the initial stages of the COVID-19 pandemic. Every patient treated in the emergency department had the same laboratory panel performed, comprising parameters such as complete blood count (CBC) as well as C-reactive protein (CRP), SCr, lactate dehydrogenase (LDH), and electrolyte levels. During the initial 24 hours, another predefined panel of laboratory tests was performed (CBC with differential, CRP, procalcitonin [PCT], LDH, ferritin, interleukin 6 [IL-6], electrolytes, SCr, liver enzymes, urinalysis). Sequential panel testing was carried out at 7, 14, 21 and 28 days. Supplemental laboratory testing was performed upon clinical indication.

Treatment of SARS-CoV-2 infection was based on designated standard operating procedures and respective protocols for COVID-19 wards. Therapeutic strategy was approximated based on several dichotomous features, which included the use of low-molecular-weight heparin, glucocorticosteroids (intravenous, oral, or by inhalation), remdesivir, tocilizumab, and oxygen therapy modalities.

Definitions

We defined AKI based on the Kidney Disease: Improving Global Outcomes (KDIGO) 2012 definition¹⁴ as an absolute increase in the SCr concentration by at least 0.3 mg/dl (26.5 μmol/l) within 48 hours, or a 50% increase in SCr from baseline within 7 days, or a urine volume of less than 0.5 ml/kg/h for at least 6 hours. The severity of AKI was divided into 3 stages also according to the KDIGO criteria. As SCr concentration and urine output were used to identify the end point, they were not included in subsequent analyses as predictive factors.

Definitions of clinical features utilized in the final model are expanded upon in Supplementary material, Definitions.

Model-building process

The random forest (RF) algorithm using the “ranger” package was selected as the model engine, guided by an array of benefits associated with its flexibility, nonlinear modeling, and resistance to overfitting.¹⁵ Variable selection was performed by an expert advisory team that adjudicated their clinical relevance. Prior to the analyses, we identified extreme outlier values, and winsorizing with a reference bound based on laboratory assay calibration was performed. Cases of suspected data entry error were re-examined and corrected, if applicable. Missing data imputation was carried out using the “mice” package,¹⁶ with subsequent variables imputed in a univariate RF model assuming fully conditional specification. Hyperparameter optimization was performed using Random Search from the “mlr3” package.¹⁷ Nested 3-fold cross-validation¹⁸ was further conducted to assess the model performance during model building, with final validation on the test set. Analysis of the model performance was carried out using the DALEX package.¹⁹ Variable importance and partial dependence plot techniques were used to summarize the impact of individual variables. Individual cases were analyzed using the breakdown technique.²⁰

Continuous variables are summarized using both mean (SD) and median (interquartile range [IQR]), with the respective count and frequency, as well as missing data rate. Cross-group comparison was performed using the Mann–Whitney or Kruskal–Wallis test for continuous measures and the χ² test for discrete variables. A P value below 0.05 was deemed significant, and all tests were 2-tailed.

Statistical analysis was carried out in R (R Core Team, 2023; R Foundation for Statistical Computing, Vienna, Austria).

Model validation

Validation of the model-building procedure was also carried out on an out-of-time sample. For this purpose, 20% of the most recent observations were selected as a separate sample in which the model performance was tested. Patients admitted to the hospital in March 2021 or later were eligible for inclusion in the out-of-time set. Of the remaining 80% of observations, another 80% were classified as the training set and 20% as the validation set. Separation of the validation set was crucial to provide a baseline for the area under the curve (AUC) rates obtained on the out-of-time set.

The AUC value that was obtained in the entire validation sample was 0.782, and that obtained in the out-of-time set was 0.794. To check the stability of the model performance over time, the AUC was calculated for each month. Since the sample size in each month after March 2021 varied greatly, the cumulative AUC was also taken into account, understood as the AUC calculated for a sample of data aggregated over time starting from March 2021. For illustrative comparison, see Figure 1 and Supplementary material, Figure S1.

**Figure 1**. Line graph illustrating areas under the curve (AUCs) on a monthly basis for sequentially aggregated sample data from the out-of-time set. A reference threshold of AUC values was derived based on validation data (derived from the same distribution as the training data). The dashed line represents AUC for validation data (0.782).

Ethical statement

The Jagiellonian University Bioethics Committee provided approval for this study (1072.6120.333.2020). The research was performed in accordance with the Declaration of Helsinki and good clinical practice guidelines. This study also adheres to the reporting of multivariable prediction models (TRIPOD) recommendations.¹⁷

Results

Out of the whole dataset (n = 5806), we derived a prolonged stay (PS) cohort (n = 4691). After preprocessing (eg, exclusion of missing SCr data), model development datasets were constructed based on a random sample split procedure: a PS training (n = 4691) and PS test set (n = 1115), and an AKI training (n = 3607) and AKI test (n = 1084) set. A descriptive summary comparing baseline clinical characteristics among individuals stratified by AKI occurrence, chronic kidney disease (CKD) stage, and AKI stage is provided in Supplementary material, Tables S1–S3, respectively.

Overall, the crude mortality rate was already greater in the AKI stage 1 (n = 301 [36%]), as compared with the non-AKI individuals (n = 648 [16%]). Furthermore, greater AKI severity was consistently associated with a higher mortality rate (46% for stage 2, 48% for stage 3). The median [IQR] duration of hospitalization was also longer starting at AKI stage 1 (14 [9–21] vs 18 [13–29] days for stages 2 and 3). However, due to a plethora of clinical characteristics that differed across participants with and without incident AKI, a more thorough evaluation requires a dedicated investigation.

For the final model that was developed in this study, nested cross-validation yielded an AUC with a range of 0.793 to 0.807, and an average performance of 0.798. For pragmatic use, we developed a simplified, clinical model using minimal biochemical data (CRP, myoglobin, CBC parameters). That model yielded an average AUC value of 0.749. Model discriminatory performance is illustrated using receiver operating curves and was compared across both test and train samples with a gray-shaded interval of confidence (Figure 2A). Global model breakdown for the most important features is presented in Figure 2B and 2C.

**Figure 2**. A – receiver operating characteristic (ROC) curves for model test and training samples; B, C – variable importance plots based on permutation tests
Abbreviations: AKI, acute kidney disease; AUC, area under the curve; CKD, chronic kidney disease; CRP, C-reactive protein; HTN, hypertension; KTx, kidney transplant; PCT, procalcitonin

Three sample cases of clinical interest were devised to study the variable contribution to model-based prognosis (Figure 3A–3F). We used breakdown techniques and sought to evaluate how individual clinical features may contribute to the final prognostic score, and to assess whether or not this may be moderated by the presence of established kidney disease. After conducting additional testing to evaluate the importance of temporal changes and the significance of baseline CKD status as a model feature, we were able to achieve consistent and satisfactory predictive performance regardless of the period of hospitalization (Figure 1; Supplementary material, Figure S1) or initial knowledge of the CKD status (Figure 4). Corresponding feature-breakdown plots are provided for the simplified model (Supplementary material, Figure S2). Left- and right-hand column comparison provides an overview of how the presence of renal disease affects the prognostic contribution of particular variables.

**Figure 3**. Explanatory model breakdown plots for designated sample clinical cases. Scenarios are stratified by the presence of chronic kidney disease (CKD) to illustrate the contribution of clinical features to model predictions. The following clinical scenarios were examined: A, B – a 31-year-old woman with no contributory medical conditions and modest elevation of acute-phase and cell lysis parameters; C, D – a 45-year-old man requiring ventilation support, with increased inflammatory burden; E, F – a 70-year-old overweight patient with hypertension and no pre-existing kidney disease.
Abbreviations: CHD, coronary heart disease; LDH, lactate dehydrogenase; T2DM, type 2 diabetes mellitus; others, see Figures 1 and 2

**Figure 4**. Receiver operating characteristic curves for additional model testing according to the presence of chronic kidney disease (CKD) as an indicator feature

To understand the relationship between model predictions and input features, partial dependence plots were examined (Supplementary material, Figures S3–S5). For most continuous variables (Supplementary material, Figure S3), a linear-like relationship was observed in areas with the greatest datapoint saturation, while the tail ends were characterized by curve flattening. However, a monotonic relationship was maintained for all variables. The potential interaction of selected variables according to the CKD status is shown in Supplementary material, Figure S4. For categorical variables, the incremental requirement for greater respiratory support, presence of circulatory failure, and prior history of kidney transplant were among the most influential factors from a global perspective (Figure 2B and 2C; Supplementary material, Figure S6). Of interest, while the partial-dependence curves for myoglobin were relatively consistent, variability in dependence was observed for hemoglobin and neutrophil count.

Additionally, test models were compared regarding their performance and importance of specific features from a global perspective when considering prognostic models for AKI with and without information regarding CKD (Figure 4; Supplementary material, Figure S6).

Discussion

While COVID-19 is primarily an acute respiratory condition, renal injury is a common finding with prognostic significance. Cohort studies of SARS-CoV-2 patients have associated the presence of AKI with duration of hospitalization and mortality.²¹ To date, the largest meta-analysis estimated the pooled incidence of AKI in COVID-19 patients at 20%, of which close to 40% required RRT and had a higher risk of early death.²² The occurrence of AKI in COVID-19 patients is suspected to exert long-term sequelae, with an enhanced CKD risk in the individuals who initially presented with adequate kidney function.^21,23 This appears to be in contrast with other forms of AKI, in which the majority of patients fully recover the organ function.²⁴ Due to the epidemic nature of COVID-19, risk stratification is of high importance for appropriate allocation of resources on a health care system level. The present study describes the development and validation of a risk prediction model for AKI occurrence in a large cohort from a tertiary COVID-19 reference center in Poland.

We observed that pre-existing kidney disease was the most important determinant of AKI prediction. This is consistent with the findings of a systematic review, in which estimated glomerular filtration rate below 60 ml/min/1.73 m² was identified as the strongest independent predictor of AKI incidence.²¹ Renal insufficiency has been consistently reported as a major determinant of AKI risk in COVID-19 patients, both in another meta-analysis²⁵ and in a large cohort study.²³

Other important model features, such as the requirement for oxygen therapy and circulatory failure, are likely reflective of the critically ill patient profile, which is inadvertently linked to organ insufficiency (consistent with non–COVID-19 forms of AKI).²⁶ This may reflect successive events in the pathophysiology of exacerbated pneumonia into acute respiratory distress syndrome (ARDS), whereupon positive pressure ventilation and hypoxemia promote inflammation and hemodynamic shifts, which ultimately affect renal perfusion (hypercapnia also reduces renal blood flow).^25,27 Furthermore, the combination of AKI and ARDS is also associated with poor prognosis.²⁸ While the angiotensin-converting enzyme 2 receptor for SARS-CoV-2 is upregulated in individuals with COVID-19,²⁹ and the use of renin-angiotensin-aldosterone (RAA) inhibitors appears to enhance the risk of renal injury,³⁰ the association between AKI and RAA-treated hypertension is inconsistent across studies.^25,31 However, hypertension (in general) is strongly associated with AKI occurrence.^25,32

Age is another important variable that always carries some nested information, as elderly individuals can be characterized by higher values of inflammatory indices, and the prognostically important leukocyte shift³³ toward lymphocytopenia and neutrophilia,³⁴ all of which were significant features in our model breakdown.

While heightened inflammation, which in some cases can be aberrant and uncontrolled, is recognized as an illness-driving process,³⁵ the association between the AKI risk and CRP levels is conflicting.²⁵ Both CRP and PCT values increased the predictive value of our model. The kidney is a tissue often affected by an overflow of proinflammatory molecules during states of system-wide inflammation. Septic patients have more severe acute injury, require RRT more frequently, and are at a higher risk of sustained kidney damage.³⁶ Data from animal models suggest that several cell types are involved in driving the underlying cycle of injury and repair; however, understanding of the pathways that drive AKI in humans is still limited.³⁷ Overall, the purported risk factors for AKI development are also consistent with features included in our models (eg, age,^21,23,38 hypertension,^23,38 neutrophil count,²¹ cardiovascular disease,³⁴ and need for oxygen therapy^34,38).

According to our data collected from 4630 patients with severe course of COVID-19, the highest risk of AKI was observed in the individuals with CKD, a history of previous AKI episodes, or those after kidney transplantation. Moreover, the need for oxygen therapy, hypertension, and circulatory failure also increased the risk of AKI. Our clinical observation, confirmed by the available data, revealed 3 main CBC parameters associated with an increased risk of AKI: PCT, neutrophil count, and myoglobin concentration. Such a profile of a high-risk patient was proved in our model. As this was a large and heterogeneous cohort, the availability of parameters typically assessed in the intensive care unit was very low. This precluded calculation of risk scores, such as Acute Physiology and Chronic Health Evaluation (APACHE), Simplified Acute Physiology Score (SAPS), or even their simplified versions.

To translate this model into pragmatic clinical practice, we decided to create a separate, simplified model, including only universally available biochemical parameters (eg, excluding IL-6 or PCT). Within this model, myoglobin and CRP concentrations served as the main laboratory features that contributed to prediction. In prior meta-analyses, it has been consistently demonstrated that an elevated myoglobin level is related to a higher risk of severe disease and mortality in COVID-19 patients.³⁹ Its concentration appears to be associated with COVID-19–related adverse outcomes in general. Both myoglobin and CRP are identified as specific risk factors related to mortality, and are intertwined with the risk of organ failure in COVID-19.³⁹ Furthermore, a conceptual model predicted that motifs from SARS-CoV-2 structural proteins could bind to porphyrin, whereas experimental data indicated that the S/N protein binds to hemoglobin and myoglobin.⁴⁰ Myoglobin is a well-known offender in the pathogenesis of AKI, which emphasizes the COVID-19–myoglobin–AKI interactions.

In the presented models, diabetes mellitus was an important variable associated with an enhanced risk of complications, extended hospital stay, and mortality in COVID-19 inpatients. Previous reports have shown that the resultant hyperglycemia is significantly associated with increased morbidity and mortality.⁴¹ Some studies have also reported a markedly higher rate of new-onset diabetes mellitus in individuals with COVID-19, which may originate from an aberrant immune-mediated response, whether directly or indirectly tied to the cytopathic viral effects.⁴¹

From a global perspective, the need for respiratory support, kidney disease status, as well as levels of myoglobin and markers of inflammation were the most important variables in our model predicting incident AKI. While predictive models should not be used for assessment of cause-and-effect relationships, as has been cautioned for elsewhere, several clinical implications follow. Owing to the underlying structure of the adopted model (ie, RF algorithm), which is based on an optimal aggregate of decision trees, we can attempt to understand (eg, using ceteris paribus profiles for a single individual or clinical scenario) how a model arrives at a given prediction from a provider’s level. For example, kidney disease is a major factor that drives model predictions, and its absence illustrates how other factors hold greater significance (Figure 3). A similar pattern is likely present in the case of patient stratification (by the model) based on the degree of respiratory support required or elevated levels of inflammatory (PCT, neutrophil count) or kidney injury (myoglobin) indices, as inferred from the global calculations for variable importance. On a clinical level, such reasoning is consistent and theoretically justifiable. However, due to the way an RF model is constructed, it is difficult to understand how it adjudicates a clinical scenario (ie, individual clinical case) in light of its nonhierarchical structure. On the other hand, tree-based models outperform other conventional measures (eg, multivariable logistic regression) and even deep learning techniques.

Machine learning (ML) models represent a valuable tool to approach prediction in large datasets with multiple variables. Studies have reported successful ML-based approaches to predict AKI in hospitalized cohorts (both surgical candidates^42,43 and inpatients in general). A variety of analytic tools are utilized in these reports, though RF models are usually the best performing ones.⁴⁴ This study is among the first^45,46 to develop a reliable prognostic model for AKI in COVID-19 patients. Identification of high-risk AKI groups allows for individual triage into, for example, a more stringent follow-up schedule. This may enable more timely initiation of supportive care or RRT (as documented in prior AKI trials), which can improve renal outcomes and survival.¹² However, the external validity of every model requires extensive validation and (often) recalibration.

In clinical practice, our model predicts the incidence of AKI independently of 3 main factors: 1) period of SARS-CoV-2 infection, 2) subtype of virus, and 3) vaccination status of a patient.

The model developed in this study showed consistent and satisfactory predictive performance, which is supported by additional validation on a temporal test set derived to evaluate time-related differences throughout the early COVID-19 pandemic (eg, changing vaccination rate, prophylaxis, treatment, etc.). Moreover, the final set of model features consisted of parameters that are readily available in most centers and are clinically justified as predictive features from a theoretical standpoint. Similar models for AKI prediction in the literature have been shown to exhibit variable performance, which is often dependent on the setting (eg, population, predictor set).^47,48 For example, models utilizing variables that are collected within the first 24 hours have worse performance (AUC of 0.64–0.77).^47,48 Considering complexity of the clinical problem at hand, the predictive capability of our model is satisfactory, though validation in other cohorts remains necessary for further extrapolation. A preliminary study of its real-life performance is currently underway. Furthermore, the variables in our model are easily available, and comprise a history of the disease, current patient state, and 3 basic laboratory parameters.

Several limitations of this study should be discussed. Development of a model based on data from a large, but single-center cohort carries inherent bias due to cross-population heterogeneity. A majority of prognostic prediction models for COVID-19 outcomes are considered to be of low quality and have a high risk of bias according to a prior systematic review.⁴⁹ Another limitation are changes in epidemiologic trends related to viral strain alterations, consecutive vaccination, divergent modes of vaccination / prophylaxis, etc., as potential covariates. From a prognostic standpoint, we assume that the relationship between vaccination and outcome has been captured within the subgroup of patients hospitalized in later periods, and that it should be similar overall. From a model breakdown perspective, vaccination was not a highly significant factor contributing to model predictions, which leaves the model-based predictions relatively unaffected by the vaccination status, as referenced by the time validation Figures.

As stated previously, we performed additional validation of the model using the temporal order of admission. As the trends in treatment and availability of different medicines were rapidly changing throughout the pandemic, the timeframe of admission is a variable that should be strictly related to a given predominant viral strain, vaccination rate, treatment strategy of choice, etc., at a certain time. Consistency (ie, relative stability) among model predictions and satisfactory predictive performance were observed from a temporal standpoint, which provides some support for model robustness respective to the variability in these important clinical factors.

Transition of our models toward clinical use should be approached with care due to the possibility of future data shift. To some extent, we attempted to control the time-related bias through time-related validation, which suggested satisfactory performance. However, the epidemiologic variability of SARS-CoV-2 strains and changing clinical profile of inpatients require repeated validation of this tool.⁵⁰

Conclusions

We demonstrated that an RF model based on clinical characteristics and routine laboratory tests can adequately predict hospital-acquired AKI among inpatients with SARS-CoV-2 infection. Internal testing, which included temporal validation, provided preliminary confirmation of consistent performance of the CRACoV-AKI tool, while model breakdown and analysis of dependence profiles yielded predictions consistent with clinical reasoning. To translate this model into clinical practice, we developed a simplified, clinically-oriented model based on readily available parameters. From a translational perspective, a high-risk AKI status can be ascertained based on evaluation of clinical parameters divided into 3 main groups: 1) history of kidney disease (prior acute or prevalent CKD, status post kidney transplantation), 2) general morbidity (hypertension, circulatory failure and / or need for respiratory support), and 3) selected, contributory abnormalities in laboratory tests (raised PCT, altered neutrophil count, and / or elevated myoglobin levels).

Supplementary material.pdf

Correspondence to

Marcin Krzanowski, MD, PhD, Department of Rheumatology and Immunology, Jagiellonian University Medical College, ul. Jakubowskiego 2, 30-688 Kraków, Poland, phone: +48 12 400 10 03, email: mkrzanowski@op.pl

Received

November 15, 2023.

Revision accepted

March 5, 2024.

Published online

March 14, 2024.

Acknowledgments

The CRACoV-HHS Study Investigators: Adamczyk Michalina, Andrychiewicz Anna, Antczak Jakub, Banaszkiewicz Małgorzata, Barańska Ilona, Bartuś Stanisław, Bednarek Agnieszka, Bednarek Joanna, Bętkowska-Korpała Barbara, Bień Artur Igor, Bociąga-Jasik Monika, Brandt Łukasz, Brudło Michał, Bryll Amira, Bryniarski Leszek, Bryniarski Paweł, Brzychczy Barbara, Brzychczy-Włoch Monika, Bugajski Janusz, Bujak-Giżycka Beata, Burliga Tomasz, Celejewska-Wójcik Natalia, Chatys-Bogacka Żaneta, Cholewczuk Agnieszka, Chromik-Legień Anna, Chrzan Robert, Chyrchel Bernadeta, Chyrchel Michał, Ciesielska Kinga, Cyranka Katarzyna, Czaikivska Zlata, Czepiel Jacek, Czepiel Klaudia, Czyżycki Mateusz, Ćwięk Aleksandra, Dembe Katarzyna, Dembiński Marcin, Drożdż Tomasz, Dudek Aleksandra, Dudek Dominika, Dwojak Mateusz, Dziewierz Artur, Dzieża-Grudnik Anna, Fedyk-Łukasik Małgorzata, Fiema Mateusz, Furman Katarzyna, Gacek Magdalena, Gajda Mateusz, Garlicki Aleksander, Garlicki Jarosław, Gąsowski Jerzy, Gołasa-Szczepaniak Paulina, Gosiewski Tomasz, Górka Karolina, Gradek-Kwinta Elżbieta, Gregorczyk-Maga Iwona, Grodzicki Tomasz, Gross-Sondej Iwona, Gryglewska Barbara, Hajek Agnes, Hartwich Patryk, Hohendorff Jerzy, Huras Hubert, Jachowicz Estera, Jagiełła Jeremiasz, Jagiełło Wojciech, Kamińska Barbara, Kania Aleksander, Kania Michał, Kapusta Przemysław, Karcz Paulina, Kasper Łukasz, Kasprzycki Karol, Kasprzyk Jakub, Katra Barbara, Kędzierska Jolanta, Kępińska-Wnuk Alicja, Kęsek Tomasz, Kiepura Anna, Kijowska Violetta, Klocek Marek, Klupa Tomasz, Kołak Magdalena, Kopeć Jolanta, Kopka Marianna, Kostrzycka Małgorzata, Kowina Natalia, Koźmiński Wojciech, Krawczyk Jacek, Krzanowska Katarzyna, Krzanowski Marcin, Krzyściak Paweł, Krzyściak Wirginia, Kukla Michał, Kusak Piotr, Laskowska-Wronarowicz Anna, Lechowicz Patrycja, Liberacka Donata, Lichołai Sabina, Lickiewicz Beata, Lorkowska-Zawicka Barbara, Łomnicka Magdalena, Łukasik Stanisław, Mach Krzysztof, Madej Józef, Majka Wojciech, Major Piotr, Małecki Maciej, Marona Monika, Matyja Andrzej, Maziarz Barbara, Mazur Konad, Mazurkiewicz Iwona, Motyl Maciej, Mrugacz Marcin, Mydel Krzysztof, Nastałek Paweł, Niezabitowska Karolina, Noga Magdalena, Nowak Klaudia, Nowakowski Michał, Olszanecka Agnieszka, Olszanecki Rafał, Olszewska-Turek Katarzyna, Ostrowski Wojciech, Pałczyńska Ewa, Pałka Anna, Pastuszak-Draxler Anna, Pawela Małgorzata, Pawliński Łukasz, Perera Ian, Petkow-Dimitrow Paweł, Pędziwiatr Michał, Piątek Anna, Piętak Ewelina, Pilecki Maciej, Piotrowicz Karolina, Podolski Adrian, Polok Kamil, Popiela Tadeusz, Pośpiech Kamila, Przybyszowski Marek, Puchalska Karolina, Pułyk Agnieszka, Pyrć Krzysztof, Pytel Krzysztof, Rajzer Marek, Rakowski Tomasz, Rojek-Zakrzewska Danuta, Romaniszyn Dorota, Różańska Anna, Rudnik Gabriela, Rudzki Łukasz, Rybicka Monika, Rymarowicz Justyna, Rzemińska Agnieszka, Rzeszutko Łukasz, Rzeźnik Monika, Salamon Dominika, Sanak Marek, Sarna-Palacz Dominika, Sawczyńska Katarzyna, Sepioło Anna, Sewiło Jakub, Siwiec Andżelika, Skalska Małgorzata, Skalska-Świstek Małgorzata, Skóra Magdalena, Sładek Krzysztof, Słowik Agnieszka, Sroka-Oleksiak Agnieszka, Stachowicz Aneta, Stachura Tomasz, Starowicz-Filip Anna, Starzyk Malwina, Stolarz-Skrzypek Katarzyna, Strach Magdalena, Struś Michał, Sulicka-Grodzicka Joanna, Surdacki Andrzej, Surowiec Paulina, Suski Maciej, Sydor Wojciech, Szaleniec Joanna, Szczerbińska Katarzyna, Szwajca Marta, Śmierciak Natalia, Talaga- Ćwiertnia Katarzyna, Terlecki Michał, Tokarczyk Zuzanna, Tomik Jerzy, Totoń- Żurańska Justyna, Trojan-Królikowska Anna, Turek Aleksander, Ucieklak Damian, Urbanik Andrzej, Walczewska Jolanta, Wężyk Kamil, Widera Alicja, Wierdak Mateusz, Wilk Magdalena, Winiarski Marek Witek Przemysław, Wizner Barbara, Włodarczyk Małgorzata, Wnuk Marcin, Wojciechowska Wiktoria, Wojkowska-Mach Jadwiga, Woroń Jarosław, Wójcik Krzysztof, Wójkowska-Mach Jadwiga, Wrona Paweł, Zarzecka-Francica Elżbieta, Zarzecka Joanna, Zawadzka Barbara, Zięba-Parkitny Joanna, Żółtowska Barbara, Żurowicz Bożena.

Funding

This publication was supported by the National Center for Research and Development CRACoV-HHS project (Model of multi-specialist hospital and non-hospital care for patients with SARS-CoV-2 infection) through the initiative “Support for specialist hospitals in fighting the spread of SARS-CoV-2 infection and in treating COVID-19” (SZPITALE JEDNOIMIENNE/18/2020). The presented research was implemented by a consortium of the University Hospital in Kraków and the Jagiellonian University Medical College.

Contribution statement

From a clinical standpoint, feature selection was derived based on expert advice from KK, MK, KN, TG, MM, MB-J, MR, KS, and BW. From a statistical perspective, KW, PB, and KB consulted model development and fine-tuning.

Conflict of interest

None declared.

How to cite

Krzanowska K, Batko K, Niezabitowska K, et al. Predicting acute kidney injury onset with a random forest algorithm using electronic medical records of COVID-19 patients: the CRACoV-AKI model. Pol Arch Intern Med. 2024; 134: 16697. doi:10.20452/pamw.16697

1.: Chen N, Zhou M, Dong X, et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet. 2020; 395: 507-513.Crossref
2.: Biancolella M, Colona VL, Mehrian-Shai R, et al. COVID-19 2022 update: transition of the pandemic to the endemic phase. Hum Genomics. 2022; 16: 19.Crossref
3.: Guan W-J, Ni Z-Y, Hu Y, et al. Clinical characteristics of coronavirus disease 2019 in China. N Engl J Med. 2020; 382: 1708-1720.Crossref
4.: Xu Z, Tang Y, Huang Q, et al. Systematic review and subgroup analysis of the incidence of acute kidney injury (AKI) in patients with COVID-19. BMC Nephrol. 2021; 22: 52.Crossref
5.: Nash K, Hafeez A, Hou S. Hospital-acquired renal insufficiency. Am J Kidney Dis. 2002; 39: 930-936.Crossref
6.: Shusterman N, Strom BL, Murray TG, et al. Risk factors and outcome of hospital-acquired acute renal failure. Clinical epidemiologic study. Am J Med. 1987; 83: 65-71.Crossref
7.: Praught ML, Shlipak MG. Are small changes in serum creatinine an important risk factor? Curr Opin Nephrol Hypertens. 2005; 14: 265-270.Crossref
8.: Silver SA, Chertow GM. The economic consequences of acute kidney injury. Nephron 2017; 137: 297-301.Crossref
9.: Khwaja A. KDIGO clinical practice guidelines for acute kidney injury. Nephron Clin Pract. 2012; 120: 179-184.Crossref
10.: Moore PK, Hsu RK, Liu KD. Management of acute kidney injury: core curriculum 2018. Am J Kidney Dis. 2018; 72: 136-148.Crossref
11.: Kashani K, Rosner MH, Haase M, et al. Quality improvement goals for acute kidney injury. Clin J Am Soc Nephrol. 2019; 14: 941-953.Crossref
12.: Meersch M, Schmidt C, Hoffmeier A, et al. Prevention of cardiac surgery-associated AKI by implementing the KDIGO guidelines in high risk patients identified by biomarkers: the PrevAKI randomized controlled trial. Intensive Care Med. 2017; 43:1551-1561.Crossref
13.: Bociąga-Jasik M, Wojciechowska W, Terlecki M, et al. Comparison between COVID-19 outcomes in the first 3 waves of the pandemic: a reference hospital report. Pol Arch Intern Med. 2022; 132: 16286.Crossref
14.: KDIGO Clinical practice guideline for acute kidney injury. Kidney Int Suppl. 2012; 2: 1-138.
15.: Sarica A, Cerasa A, Quattrone A. Random forest algorithm for the classification of neuroimaging data in Alzheimer’s disease: a systematic review. Front Aging Neurosci. 2017; 9: 329.Crossref
16.: van Buuren S, Groothuis-Oudshoorn K. Multivariate imputation by chained equations in R. J Stat Softw. 2011; 45: 1-67.Crossref
17.: Lang M, Binder M, Richter J, et al. A modern object-oriented machine learning framework in R. J Open Source Softw. 2019; 4: 1903.Crossref
18.: Cawley GC, Talbot NLC. On over-ﬁtting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res. 2010; 10: 2079-2107.
19.: Biecek P. DALEX: explainers for complex predictive models in R. J Mach Learn Res. 2018; 19: 1-5.
20.: Fisher A, Rudin C, Dominici F. All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. J Mach Learn Res. 2019; 20: 177.
21.: Jewell PD, Bramham K, Galloway J, et al. COVID-19-related acute kidney injury; incidence, risk factors and outcomes in a large UK cohort. BMC Nephrol. 2021; 22: 359.Crossref
22.: Raina R, Mahajan ZA, Vasistha P, et al. Incidence and outcomes of acute kidney injury in COVID-19: a systematic review. Blood Purif. 2022; 51: 199-212.Crossref
23.: Bowe B, Cai M, Xie Y, et al. Acute kidney injury in a national cohort of hospitalized US veterans with COVID-19. Clin J Am Soc Nephrol. 2020; 16: 14-25.Crossref
24.: Heung M, Steffick DE, Zivin K, et al. Acute kidney injury recovery pattern and subsequent risk of CKD: an analysis of veterans health administration data. Am J Kidney Dis. 2016; 67: 742-752.Crossref
25.: Zhang J, Pang Q, Zhou T, et al. Risk factors for acute kidney injury in COVID-19 patients: an updated systematic review and meta-analysis. Ren Fail. 2023; 45: 2170809.Crossref
26.: Hoste EAJ, Kellum JA, Selby NM, et al. Global epidemiology and outcomes of acute kidney injury. Nat Rev Nephrol. 2018; 14: 607-625.Crossref
27.: Darmon M, Clec’h C, Adrie C, et al. Acute respiratory distress syndrome and risk of AKI among critically ill patients. Clin J Am Soc Nephrol. 2014; 9: 1347-1353.Crossref
28.: Tomasi A, Song X, Gajic O, et al. Kidney and lung crosstalk during critical illness: large-scale cohort study. J Nephrol. 2023; 36: 1037-1046.Crossref
29.: Su H, Yang M, Wan C, et al. Renal histopathological analysis of 26 postmortem findings of patients with COVID-19 in China. Kidney Int. 2020; 98: 219-227.Crossref
30.: Gnanenthiran SR, Borghi C, Burger D, et al. Renin-angiotensin system inhibitors in patients with COVID-19: a meta-analysis of randomized controlled trials led by the International Society of Hypertension. J Am Heart Assoc. 2022; 11: e026143.Crossref
31.: Morales DR, Conover MM, You SC, et al. Renin-angiotensin system blockers and susceptibility to COVID-19: an international, open science, cohort analysis. Lancet Digit Health. 2021; 3: 98-114.Crossref
32.: Cai X, Wu G, Zhang J, et al. Risk factors for acute kidney injury in adult patients with COVID-19: a systematic review and meta-analysis. Front Med (Lausanne) 2021; 8: 719472.Crossref
33.: Huang G, Kvalic AJ, Graber CJ. Prognostic value of leukocytosis and lymphopenia for coronavirus disease severity. Emerg Infect Dis. 2020; 26: 1839-1841.Crossref
34.: Zhang B, Zhou X, Qiu Y, et al. Clinical characteristics of 82 cases of death from COVID-19. PLoS One. 2020; 15: e0235458.Crossref
35.: Rabaan AA, Al-Ahmed SH, Muhammad J, et al. Role of inflammatory cytokines in COVID-19 patients: a review on molecular mechanisms, immune functions, immunopathology and immunomodulatory drugs to counter cytokine storm. Vaccines. 2021; 9: 436.Crossref
36.: Piccinni P, Cruz DN, Gramaticopolo S, et al. Prospective multicenter study on epidemiology of acute kidney injury in the ICU: a critical care nephrology Italian collaborative effort (NEFROINT). Minerva Anestesiol. 2011; 77: 1072-1083.
37.: Rabb H, Griffin MD, McKay DB, et al. Inflammation in AKI: current understanding, key questions, and knowledge gaps. J Am Soc Nephrol. 2016; 27: 371-379.Crossref
38.: Hirsch JS, Ng JH, Ross DW, et al. Acute kidney injury in patients hospitalized with COVID-19. Kidney Int. 2020; 98: 209-218.Crossref
39.: Ali A, Noman M, Guo Y, et al. Myoglobin and C-reactive protein are efficient and reliable early predictors of COVID-19 associated mortality. Sci Rep. 2021; 11: 5975.Crossref
40.: Dyankov G, Genova-Kalou P, Eftimov T, et al. Binding of SARS-CoV-2 structural proteins to hemoglobin and myoglobin studied by SPR and DR LPG. Sensors (Basel). 2023, 23: 3346-3357.Crossref
41.: Nassar M, Daoud A, Nso N, et al. Diabetes mellitus and COVID-19: review article. Diabetes Metab Syndr. 2021; 15: 102268-102276.Crossref
42.: Kerr KF, Morenz ER, Roth J, et al. Developing biomarker panels to predict progression of acute kidney injury after cardiac surgery. Kidney Int Rep. 2019; 4: 1677-1688.Crossref
43.: Zhou C, Wang R, Jiang W, et al. Machine learning for the prediction of acute kidney injury and paraplegia after thoracoabdominal aortic aneurysm repair. J Card Surg. 2020; 35: 89-99.Crossref
44.: Vagliano I, Chesnaye NC, Leopold JH, et al. Machine learning models for predicting acute kidney injury: a systematic review and critical appraisal. Clin Kidney J. 2022; 15: 2266-2280.Crossref
45.: McAdams MC, Xu P, Saleh SN, et al. Risk prediction for acute kidney injury in patients hospitalized with COVID-19. Kidney Med. 2022; 4: 100463.Crossref
46.: Palomba H, Cubos D, Bozza F, et al. Development of a risk score for AKI onset in COVID-19 patients: COV-AKI score. BMC Nephrol. 2023; 24: 46.Crossref
47.: Hu Y, Liu K, Ho K, et al. A simpler machine learning model for acute kidney injury risk stratification in hospitalized patients. J Clin Med. 2022; 11: 5688.Crossref
48.: Argyropoulos A, Townley S, Upton PM, et al. Identifying on admission patients likely to develop acute kidney injury in hospital. BMC Nephrol. 2019; 20: 56.Crossref
49.: Wynants L, Calster BV, Collins GS, et al. Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal. BMJ. 2020; 369: 1328.Crossref
50.: Camerer CF, Dreber A, Holzmeister F, et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat Hum Behav. 2018; 2: 637-644.Crossref