Screening and diagnosis |
Artzi et al., 2020 [4] |
- Retrospective nationwide electronic health record data of 588,622 pregnancies from 368,351 women between 2010 to 2017 in Israel including data of demographics, anthropometrics, laboratory tests, diagnoses, and pharmaceuticals |
- Aim: to establish an ML model to improve the prediction of gestational diabetes based on electronic health record vs. a conventional screening tool |
Key implications |
- ML was useful in developing a simple nine-question model in self-reportable format from the large electronic health record dataset, which outperformed the current standard screening tool (AUROC 0.80 vs. 0.68). |
- Internal validation set (n = 137,220; with geotemporal difference) |
- Reference labels: gestational diabetes diagnosis by a twostep approach (glucose challenge test and oral glucose tolerance test at 24~28 weeks of gestation) |
- May facilitate early-stage interventions for women at high risk for gestational diabetes |
- Comparator: National Institute of Health sevenitem questionnaire |
- May aid construction of a selective, cost-effective screening approach according to predicted gestational diabetes risk instead of the current universal screening approach Limitations |
- Inherent bias from retrospective electronic health record data review |
|
- Methods: supervised learning; gradient boosting model |
- Performance might be different when based on actual self-reported surveys. |
|
Perakakis et al., 2019 [11] |
- Serum samples of 49 healthy subjects and 31 patients with biopsyproven NAFLD |
- Aim: to train models for the non-invasive diagnosis of NASH and liver fibrosis based on circulating lipids, glycans, fatty acids identified by liquid Chromatography with tandem mass spectrometry LC-MS/MS and biochemical parameters |
Key implications |
- Internal validation with three-fold crossvalidation |
- The ML model including 20 features consisted of lipidomics, glycans, and adiponectin yielded high accuracy up to 90% in discriminating healthy individuals from patients with NAFLD and NASH. |
- Reference label: biopsyproven NAFLD |
- May provide a low-risk cost-effective, non-invasive alternative method to liver biopsy. |
- Comparator: not applicable - Methods: supervised learning; one-vs-rest nonlinear support vector machine models with recursive feature elimination |
Limitations |
- Validation cohort was not available |
- Needs to be further validated in a different population. |
Risk prediction |
Segar et al., 2019 [12] |
- 8,756 Patients without heart failure at baseline from the ACCORD trial dataset (50% training set; 50% internal validation set; conducted between 1999 to 2009) |
- Aim: to develop an ML model to predict incident heart failure among patients with type 2 diabetes |
Key implications |
- The ML-based models showed modest performance in prediction for incident heart failure among patients with type 2 diabetes in the external validation set (C-index 0.70 to 0.74). |
- External validation set: 10,819 participants without prevalent heart failure from the ALLHAT trial |
- Reference label: incident hospitalization or death due to heart failure (captured and adjudicated by two independent reviewer physicians during the trial) |
- Each 1-unit increment in the WATCHDM score was associated with a 24% higher relative risk of heart failure within 5 years |
- Strength of analyzing a large number of participants from a well-phenotyped clinical trial population Limitations |
- Comparator: not applicable |
- Discrimination for heart failure with preserved ejection fraction was relatively low in the subgroup analysis. |
- Methods: supervised learning; random survival forest-based model |
- Temporal changes of heart failure biomarkers and medications could not be reflected in the model. |
- Need to validate the model in lowerrisk cohorts of individuals with type 2 diabetes |
Basu et al., 2018 [13] |
- 10,251 ACCORD trial participants aged 40 to 79 years with type 2 diabetes, HbA1c 7.5% or higher, or cardiovascular diseases or risk factors, those who randomized to target HbA1c < 6.0% (intensive) vs. 7.0% to |
- Aim: to identify subgroups with a heterogeneous treatment effect in response to intensive glycemic therapy |
Key implications |
- Compared to 3.7% increased mortality by intensive vs. standard therapy in group 4, group 1 showed a 2.3% mortality reduction in the intensive therapy group (95% CI, -0.2% to 4.5%), which made the obvious contrast with the main result from the study. |
-7.9% (standard group) |
- Reference label: treatment effect defined as the absolute difference in the all-cause mortality rate between the intensive and standard therapy groups |
- Identified characteristics of patients who may have benefited from intensive glycemic therapy (younger individuals with relatively low hemoglycosylation index) |
- Offered an example to find clinically meaningful subgroups with heterogeneous treatment effects using data from randomized trials. |
- Comparator: not applicable |
Limitations |
- Methods: supervised learning; gradient forest analysis |
- Post hoc analysis of a single trial that was conducted before the development of recent diabetes medications with cardiovascular benefits |
Oroojeni et al., 2019 [5] |
- Medical records of 87 patients with type 1 diabetes from Mass General Hospital; data for each patient’s visits over a 10-year period (training set) between 2003 to 2013; HbA1c, body mass index, activity level, alcohol usage status, insulin (Lantus) dose |
- Aim: to explore an effective reinforcement learning framework for determining the optimal long-acting insulin dose for patients with type 1 diabetes |
Key implications |
- The physician-prescribed insulin dose was within the dosing interval recommended by the Q-learning algorithm in 88% of test cases. |
- External validation with 60 cases |
- Reference label: physicianprescribed insulin dose |
- A proof-of-concept study to provide clinical decision support for determining insulin dose in patients with type 1 diabetes, by applying reinforcement learning algorithm |
Limitations |
- Comparator: not applicable |
- Limited by omitting lifestyle information regarding diet, stress, and medication adherence |
- Methods: reinforcement learning; Q-learning with reward function set from HbA1c status at the visit and change of HbA1c from the past visit |
- A relatively small training set |
- Only one type of insulin (Lantus) was examined in the model |
Translational research |
Liu et al., 2020 [14] |
- 20 Drug-naive individuals with prediabetes (discovery cohort) |
- Aim: to find an ML model for predicting exercise responsiveness determined from exercise-induced alterations in the gut microbiota |
Key implications |
- Determined exercise responders and nonresponders after 12-week high-intensity exercise training |
- The ML model identified 14 microbiome species and 15 metabolites from human feces were able to predict exercise responsiveness (AUROC 0.75 in the validation set). |
- Reference label: responders defined as a decrease in the homeostatic model assessment of insulin resistance greater than two-fold technical error |
- Provide an example of applying ML principles to human-tomice translational study based on microbiome dataset |
- Collected pre- and postexercise period feces to analyze gut microbiota profile |
Limitations |
- Comparator: not applicable |
- Relatively small sample size |
- Internal validation with 10-fold cross-validation |
- Methods: supervised learning; random forest model |
- Limited to Chinese males only |
- Need further validation in different population set |