Disease Context
A three-step machine-learning pipeline that turns blood RNA into a per-patient SLE readout.
Disease Context
Current Gap
Our approach
Same modelling workflow, different clinical questions.
Method · how each model stage is done
"Does this patient have lupus?"
"Will they be active at the next visit?"
"Will they respond to first-line therapy?"
Evaluation metrics
One selection rule across every stage — accuracy plus the metrics that matter in a clinic.
Share of patients classified correctly on held-out cohorts.
Honest top-line, but hides class imbalance.
Harmonic mean of precision and recall on the minority class.
Penalises missed flares - the costly clinical error.
Patient-level CV plus a one-time external cohort (GSE49454).
Confirms the panel transfers across labs and platforms.
End-to-end latency from upload to risk band on a single patient.
Sub-second matters for a real bedside tool.
Results · external validation
0.97AUROC
Random Forest beat limma signature and LASSO on macro-F1. 5-fold stratified CV · GSE72509 · 117 samples.
0.82AUC
External hold-out — never seen during training. Trained on GSE65391, tested on GSE49454.
0.87AUROC
Random Forest top on macro-F1 across five candidates. 5-fold CV with balanced bootstrap · GSE224705 · SRI-4 endpoint.
Final architecture
Whole-blood expression upload, validated against the gene panel.
z-score against the reference distribution, missing-value imputation.
~50 transcripts selected on the training cohort only.
Three calibrated outputs - diagnosis, flare risk, treatment response.
Banded risk score with the top features behind every prediction.
No retraining at inference - the same trained model serves every patient request.
Live demonstration
Lupus care is reactive - molecular signal exists in blood but rarely reaches the clinic.
A three-step Random-Forest pipeline that classifies, predicts flares, and forecasts treatment response.
External validation holds up, inference is sub-second, and the console is usable by a non-ML clinician.
Validate on a contemporary, ancestry-diverse cohort to confirm calibration in the real world.
Layer autoantibody, cytokine and methylation features alongside the transcript module.
Host as a clinician-facing web service with per-prediction SHAP-style attribution.
Clinical background, methods, prior art, and scoring referenced in the deck.
Dr. Andy Tran · Elyna Lin
Many thanks to our supervisors Andy and Elyna for the weekly guidance, thoughtful feedback, and steady support throughout the project.