PhenoGenX – Model Performance Dashboard

Comprehensive evaluation of ensemble ML models for HIV-1 drug resistance prediction

ML Backend: Operational
Last updated: Now
Drugs Evaluated
22
Across all ARV classes
Training Samples
42587
Genotype-phenotype pairs
Best R²
0.91
Tipranavir (TPV)
Best CRPS
83.66
Ritonavir (RTV)

Cross-Validated Training Performance

Ensemble model performance metrics based on 5-fold cross-validation. CRPS (Composite Ranking Performance Score) combines R², correlation, MAE, accuracy, and stability for balanced model selection.

Model Performance by Drug
Drug Best R² Best CRPS Mutations Samples
ABC
Antiretroviral drug
0.53
Good
64.2
Good
193
Unique mutations
2233
Training samples
D4T
Antiretroviral drug
0.76
Excellent
78.1
Excellent
209
Unique mutations
2379
Training samples
NVP
Antiretroviral drug
0.78
Excellent
78.3
Excellent
204
Unique mutations
2454
Training samples
3TC
Antiretroviral drug
0.84
Excellent
77.8
Excellent
255
Unique mutations
2706
Training samples
EFV
Antiretroviral drug
0.79
Excellent
76.4
Excellent
212
Unique mutations
2567
Training samples
AZT
Antiretroviral drug
0.74
Excellent
76.9
Excellent
207
Unique mutations
2383
Training samples
DDI
Antiretroviral drug
0.58
Good
72.6
Good
209
Unique mutations
2380
Training samples
TDF
Antiretroviral drug
0.70
Excellent
79.5
Excellent
180
Unique mutations
2014
Training samples
ETR
Antiretroviral drug
0.69
Good
70.2
Good
94
Unique mutations
1195
Training samples
FTC
Antiretroviral drug
0.68
Good
73.4
Good
48
Unique mutations
556
Training samples
EVG
Antiretroviral drug
0.74
Excellent
77.3
Excellent
89
Unique mutations
1624
Training samples
RAL
Antiretroviral drug
0.72
Excellent
71.7
Good
93
Unique mutations
1690
Training samples
DTG
Antiretroviral drug
0.78
Excellent
77.2
Excellent
74
Unique mutations
962
Training samples
BIC
Antiretroviral drug
0.65
Good
69.7
Good
46
Unique mutations
609
Training samples
NFV
Antiretroviral drug
0.58
Good
76.7
Excellent
197
Unique mutations
2418
Training samples
SQV
Antiretroviral drug
0.68
Good
71.8
Good
195
Unique mutations
2368
Training samples
FPV
Antiretroviral drug
0.81
Excellent
75.0
Good
195
Unique mutations
2321
Training samples
IDV
Antiretroviral drug
0.72
Excellent
77.7
Excellent
191
Unique mutations
2361
Training samples
RTV
Antiretroviral drug
0.81
Excellent
83.7
Excellent
187
Unique mutations
2331
Training samples
LPV
Antiretroviral drug
0.79
Excellent
81.5
Excellent
176
Unique mutations
2048
Training samples
TPV
Antiretroviral drug
0.91
Excellent
73.2
Good
122
Unique mutations
1327
Training samples
ATV
Antiretroviral drug
0.55
Good
74.6
Good
154
Unique mutations
1661
Training samples
Performance Classification Guidelines
CRPS Scoring System:
Excellent (CRPS > 75)

Top-quartile performance with high accuracy and stability across all metrics. Suitable for clinical decision support.

Good (CRPS 60-75)

Reliable performance with balanced metrics across all dimensions. Acceptable for clinical use with appropriate monitoring.

Needs Improvement (CRPS < 60)

Requires model refinement, additional training data, or feature engineering to achieve clinical reliability standards.

About CRPS Scoring System

The Composite Ranking Performance Score (CRPS) is a comprehensive metric designed to evaluate machine learning models across multiple dimensions simultaneously. Unlike single metrics, CRPS provides a balanced assessment of model quality by integrating:

R² Score (20%)
Measures proportion of variance in phenotypic fold-change explained by the model. Higher values indicate better predictive power.
Spearman Rank Correlation (20%)
Assesses monotonic relationship between predicted and actual values. Robust to outliers and non-linear relationships.
Mean Absolute Error (20%)
Quantifies average magnitude of prediction errors in absolute terms. Lower values indicate higher precision.
Categorical Accuracy (20%)
Evaluates correct classification into resistance categories (Susceptible/Intermediate/Resistant). Reflects clinical utility.
Model Stability (20%)
Measures reproducibility across different random seeds and data splits. Ensures consistent performance in real-world applications.
Calculation Method: Each model is ranked across all drugs for each of the 5 metrics. Ranks are normalized to 0-100 scale, then averaged with equal 20% weighting. CRPS > 70 indicates top-quartile performance, suitable for clinical decision support.

Independent Clinical Validation

Categorical performance metrics from independent validation datasets. Sensitivity and specificity are reported separately for Susceptible (S) and Resistant (R) classifications to reflect differential clinical risks.

Validation Performance by Drug
Drug N Susceptible (S) Metrics Resistant (R) Metrics
Samples Sens Spec PPV NPV Sens Spec PPV NPV
3TC
Drug
655
Samples
94%
Sensitivity
85%
Specificity
61%
PPV
98%
NPV
94%
Sensitivity
98%
Specificity
99%
PPV
93%
NPV
ABC
Drug
462
Samples
72%
Sensitivity
91%
Specificity
87%
PPV
80%
NPV
11%
Sensitivity
99%
Specificity
89%
PPV
53%
NPV
ATV
Drug
290
Samples
70%
Sensitivity
84%
Specificity
67%
PPV
86%
NPV
61%
Sensitivity
83%
Specificity
71%
PPV
76%
NPV
AZT
Drug
607
Samples
73%
Sensitivity
96%
Specificity
92%
PPV
84%
NPV
95%
Sensitivity
67%
Specificity
61%
PPV
96%
NPV
BIC
Drug
540
Samples
81%
Sensitivity
77%
Specificity
92%
PPV
55%
NPV
30%
Sensitivity
99%
Specificity
87%
PPV
88%
NPV
D4T
Drug
611
Samples
87%
Sensitivity
76%
Specificity
50%
PPV
96%
NPV
46%
Sensitivity
95%
Specificity
78%
PPV
83%
NPV
DDI
Drug
609
Samples
80%
Sensitivity
75%
Specificity
88%
PPV
63%
NPV
10%
Sensitivity
99%
Specificity
84%
PPV
77%
NPV
DTG
Drug
889
Samples
77%
Sensitivity
81%
Specificity
95%
PPV
41%
NPV
47%
Sensitivity
97%
Specificity
59%
PPV
94%
NPV
EFV
Drug
691
Samples
95%
Sensitivity
55%
Specificity
54%
PPV
95%
NPV
73%
Sensitivity
97%
Specificity
94%
PPV
83%
NPV
EVG
Drug
1532
Samples
97%
Sensitivity
40%
Specificity
3%
PPV
100%
NPV
72%
Sensitivity
95%
Specificity
86%
PPV
88%
NPV
FPV
Drug
797
Samples
92%
Sensitivity
69%
Specificity
37%
PPV
98%
NPV
72%
Sensitivity
92%
Specificity
86%
PPV
83%
NPV
IDV
Drug
801
Samples
94%
Sensitivity
92%
Specificity
90%
PPV
96%
NPV
36%
Sensitivity
98%
Specificity
95%
PPV
54%
NPV
LPV
Drug
501
Samples
91%
Sensitivity
94%
Specificity
87%
PPV
96%
NPV
81%
Sensitivity
85%
Specificity
85%
PPV
81%
NPV
NFV
Drug
836
Samples
89%
Sensitivity
95%
Specificity
90%
PPV
94%
NPV
86%
Sensitivity
86%
Specificity
83%
PPV
88%
NPV
NVP
Drug
706
Samples
98%
Sensitivity
84%
Specificity
88%
PPV
98%
NPV
83%
Sensitivity
96%
Specificity
94%
PPV
89%
NPV
RAL
Drug
1636
Samples
100%
Sensitivity
56%
Specificity
2%
PPV
100%
NPV
93%
Sensitivity
85%
Specificity
76%
PPV
96%
NPV
RTV
Drug
802
Samples
96%
Sensitivity
80%
Specificity
62%
PPV
98%
NPV
92%
Sensitivity
87%
Specificity
85%
PPV
93%
NPV
SQV
Drug
824
Samples
86%
Sensitivity
89%
Specificity
91%
PPV
83%
NPV
51%
Sensitivity
94%
Specificity
77%
PPV
81%
NPV
TDF
Drug
296
Samples
89%
Sensitivity
54%
Specificity
30%
PPV
96%
NPV
14%
Sensitivity
94%
Specificity
63%
PPV
62%
NPV
TPV
Drug
148
Samples
44%
Sensitivity
88%
Specificity
90%
PPV
36%
NPV
40%
Sensitivity
68%
Specificity
31%
PPV
75%
NPV
Clinical Interpretation Guidance

For Susceptible (S) Classification: High NPV(S) ensures patients predicted as susceptible are truly susceptible, reducing risk of ineffective treatment. NPV(S) > 95% indicates excellent safety profile.

For Resistant (R) Classification: High Sens(R) ensures resistance cases are not missed, preventing treatment failure. Sens(R) > 90% is ideal for clinical use.


Color Coding:
Green (>70%) Yellow (50-70%) Red (<50%)
Key Safety Metrics:
  • Focus on NPV(S) for safe treatment selection
  • Prioritize Sens(R) for resistance detection
  • PPV(R) critical for avoiding unnecessary treatment changes

Validation Visualizations

Performance visualizations and diagnostic plots from independent validation datasets. These visualizations provide comprehensive insights into model performance across different metrics.

ROC AUC Analysis
ROC AUC Scatter Analysis

ROC curves and AUC scores across different antiretroviral drug classes. Demonstrates diagnostic performance trade-offs between sensitivity and specificity.

Model Calibration Analysis
Calibration Comparison

Comparison of predicted probabilities against observed frequencies. Assesses model calibration and reliability of confidence estimates.

Threshold Optimization Analysis
Cutoff vs Performance

Sensitivity, specificity, and accuracy as functions of classification thresholds. Guides optimal threshold selection for clinical decision-making.

Classification Performance Analysis
Accuracy vs Weighted Kappa

Relationship between accuracy and weighted Cohen's kappa across different drugs. Evaluates classification reliability accounting for class imbalance.

Mutation Network Analysis
Mutation Network Visualization

Network visualization of mutation co-occurrence patterns and their associations with drug resistance. Reveals complex mutation interactions.

PhenoGenX Score Distribution
PhenoGenX Score Distribution

Distribution of PhenoGenX resistance scores across different drug classes. Shows model output characteristics and score ranges.

Enhanced Score Distribution Analysis
Enhanced Score Distribution

Comprehensive analysis of PhenoGenX score distributions with kernel density estimation. Provides detailed insights into model output characteristics, including multi-modal distributions and class separability across different resistance categories.

Visualization Notes

Dynamic Loading: All visualizations are loaded from /static/images/ directory. Images are automatically refreshed when new validation results are available.

Interactive Features: Hover over images to see enhanced details. Click on any visualization to view it in full-screen mode for detailed inspection.

Image Generation: These visualizations are automatically generated during model validation and provide comprehensive insights into model performance across different dimensions.