Monarch

A three-stage clinical decision support console for Systemic Lupus Erythematosus.

Presenter:Lezhi Lin

Authors:Manna Berry¹, Lezhi Lin¹, Udit Samant¹, Hadi Shafat¹, Jillian Zhao¹ and Minh Hieu Tran⁶

Advisors:Dr. Andy Tran and Elyna Lin

Background:Systemic Lupus Erythematosus

Previous Innovations: Better A three-stage machine learning pipelines

Diagnosis

Progression

Treatment

Methodology: modelling workflow

01 Data collection

DiagnosisGSE72509
ProgressionGSE65391, GSE49454
TreatmentGSE224705

02 Pre-processing & EDA

Dataset cleaning
Quality check
Standardisation

03 Gene selection

Ranked by importance
Curated to a compact panel
141genes in total

04 Machine learning

Limma

LASSO

Elastic Net

GBM

Linear SVM

Random Forest

05 Performance evaluation

Imbalanced data

Stratified 5-fold cross-validation

AUROC, Balanced Accuracy, Macro-F1

A Random Forest model leading the field

01Diagnosis

02Progression

03Treatment

AUROC Higher is better

Typical machine learning products in clinical practice

0.972

0.875

0.852

0.670

0.865

0.750

Model discrimination benchmarked against literature results.

Monarch

From a blood sample to a recommendation in one click.

Monarch

Decision support for SLE

Conclusions

Lupus prevalence by countryAnnual direct cost by country

Exagen AVISE (2012)

Meet Our Team

Udit Samant

Diagnosis model

Manna Berry

Progression model

Jillian Zhao

Treatment model

Hadi Shafat

Research

Lezhi Lin

App development

Dr. Andy Tran and Elyna Lin

Our beloved advisors

Thank you.

Presenter:Lezhi Lin

Authors:Manna Berry¹, Lezhi Lin¹, Udit Samant¹, Hadi Shafat¹, Jillian Zhao¹ and Minh Hieu Tran⁶

Advisors:Dr. Andy Tran and Elyna Lin

Appendix · 01 / 07

Full model performance.

01 Diagnosis

Model	AUROC	BAL-ACC	MACRO F1
RF	0.972	0.939	0.950
LASSO	0.962	0.934	0.934
limma	0.953	0.881	0.849

02 Progression

Model	AUROC	BAL-ACC	MACRO F1
RF	0.852	0.790	0.789
Elastic Net	0.845	0.801	0.800
GBM	0.836	0.791	0.789

03 Treatment

Model	AUROC	BAL-ACC	MACRO F1
RF	0.865	0.789	0.791
Elastic Net	0.780	0.736	0.723
LASSO	0.757	0.722	0.698
Linear SVM	0.792	0.709	0.682
limma	0.656	0.621	0.621

Appendix · 02 / 07

Background & epidemiology.

Clinical background, epidemiology, and the global burden of the disease.

Aringer, M., & Bertsias, G. (2025). Early diagnosis of systemic lupus erythematosus. Rare Disease and Orphan Drugs Journal, 4, 13. https://doi.org/10.20517/rdodj.2024.59
Baechler, E. C., Batliwalla, F. M., Karypis, G., Gaffney, P. M., Ortmann, W. A., Espe, K. J., Shark, K. B., Grande, W. J., Hughes, K. M., Kapur, V., Gregersen, P. K., & Behrens, T. W. (2003). Interferon-inducible gene expression signature in peripheral blood cells of patients with severe lupus. Proceedings of the National Academy of Sciences, 100(5), 2610–2615. https://doi.org/10.1073/pnas.0337679100
Kwon, Y.-C., Chun, S., Kim, K., & Mak, A. (2019). Update on the genetics of systemic lupus erythematosus: Genome-wide association studies and beyond. Cells, 8(10), 1180. https://doi.org/10.3390/cells8101180
Lin, D. H., Murimi-Worstell, I. B., Kan, H., Tierce, J. C., Wang, X., Nab, H., Desta, B., Hammond, E. R., & Alexander, G. C. (2022). Health care utilization and costs of systemic lupus erythematosus in the United States: A systematic review. Lupus, 31(7), 773–807. https://doi.org/10.1177/09612033221088209
National Health Service. (2023). Lupus. https://www.nhs.uk/conditions/lupus/ [Reviewed July 19, 2023]
National Institute of Arthritis and Musculoskeletal and Skin Diseases. (2022). Systemic lupus erythematosus (lupus). https://www.niams.nih.gov/health-topics/lupus [Last reviewed October 2022]
National Library of Medicine. (2024). Lupus. MedlinePlus. https://medlineplus.gov/lupus.html [Last updated July 1, 2024]
Natural Earth. (n.d.). Admin 0 – Countries (5.1.1) [Data set]. https://www.naturalearthdata.com/ Public-domain vector map dataset. Retrieved 2026.
Tian, J., Zhang, D., Yao, X., Huang, Y., & Lu, Q. (2023). Global epidemiology of systemic lupus erythematosus: A comprehensive systematic analysis and modelling study. Annals of the Rheumatic Diseases, 82(3), 351–356. https://doi.org/10.1136/ard-2022-223035
Wang, H., Li, M., Zou, K., Wang, Y., Jia, Q., Wang, L., Zhao, J., Wu, C., Wang, Q., Tian, X., Wang, Y., & Zeng, X. (2023). Annual direct cost and cost-drivers of systemic lupus erythematosus: A multi-center cross-sectional study from CSTAR registry. International Journal of Environmental Research and Public Health, 20(4), 3522. https://doi.org/10.3390/ijerph20043522

Appendix · 03 / 07

Datasets & cohorts.

The public gene-expression cohorts the three models are trained and validated on.

Banchereau, R., Hong, S., Cantarel, B., Baldwin, N., Baisch, J., Edens, M., Cepika, A.-M., Acs, P., Turner, J., Anguiano, E., & Pascual, V. (2016). Personalized immunomonitoring uncovers molecular networks that stratify lupus patients. Cell, 165(3), 551–565. https://doi.org/10.1016/j.cell.2016.03.008 [Data set; GSE65391]
Chiche, L., Jourde-Chiche, N., Whalen, E., Presnell, S., Gersuk, V., Dang, K., & Chaussabel, D. (2014). Modular transcriptional repertoire analyses of adults with SLE reveal distinct type I and type II interferon signatures. Arthritis & Rheumatology, 66(6), 1583–1595. https://doi.org/10.1002/art.38628 [Data set; GSE49454]
Hung, T., Pratt, G. A., Sundararaman, B., Townsend, M. J., Chaivorapol, C., Bhangale, T., Graham, R. R., Ortmann, W., Behrens, T. W., Yeo, G. W., & Chaussabel, D. (2015). The Ro60 autoantigen regulates inflammatory gene expression in SLE [Data set]. NCBI Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE72509 [GSE72509]
NCBI Gene Expression Omnibus. (2023). Whole-blood microarray expression in lupus nephritis: Treatment response by SRI-4 [Data set]. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE224705 [GSE224705]

Appendix · 04 / 07

Modelling & evaluation.

The learning algorithms, feature selection, and clinical scoring behind the pipeline.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/10.18637/jss.v033.i01
Furie, R., Petri, M. A., Wallace, D. J., Ginzler, E. M., Merrill, J. T., Stohl, W., Chatham, W. W., Strand, V., Weinstein, A., & Chevrier, M. (2009). Novel evidence-based systemic lupus erythematosus responder index. Arthritis & Rheumatism, 61(9), 1143–1151. https://doi.org/10.1002/art.24698
Gladman, D. D., Ibañez, D., & Urowitz, M. B. (2002). Systemic lupus erythematosus disease activity index 2000. The Journal of Rheumatology, 29(2), 288–291.
Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). John Wiley & Sons. https://doi.org/10.1002/9781118548387 (AUROC interpretation thresholds — 0.7–0.8 acceptable, 0.8–0.9 excellent)
Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of Statistical Software, 36(11), 1–13. https://doi.org/10.18637/jss.v036.i11
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 4765–4774.
Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47. https://doi.org/10.1093/nar/gkv007

Appendix · 05 / 07

Competitive landscape.

Existing lupus tests, published model benchmarks, the gene-expression analogue, and market size.

Exagen. (n.d.). AVISE Lupus [Diagnostic test]. Exagen. https://exagen.com/tests/lupus/
Jiang, Z., Shao, M., Dai, X., Pan, Z., & Liu, D. (2022). Identification of diagnostic biomarkers in systemic lupus erythematosus based on bioinformatics analysis and machine learning. Frontiers in Genetics, 13, 865559. https://doi.org/10.3389/fgene.2022.865559
Kegerreis, B., Catalina, M. D., Bachali, P., Geraci, N. S., Labonte, A. C., Zeng, C., Stocks, N., Hubbard, E. L., Grammer, A. C., & Lipsky, P. E. (2019). Machine learning approaches to predict lupus disease activity from gene expression data. Scientific Reports, 9, 9617. https://doi.org/10.1038/s41598-019-45989-0
Lee, D.-J., Tsai, P.-H., Chen, C.-C., & Dai, Y.-H. (2023). Incorporating knowledge of disease-defining hub genes and regulatory network into a machine learning-based model for predicting treatment response in lupus nephritis after the first renal flare. Journal of Translational Medicine, 21, 76. https://doi.org/10.1186/s12967-023-03931-z
Leventhal, E. L., Daamen, A. R., Grammer, A. C., & Lipsky, P. E. (2023). An interpretable machine learning pipeline based on transcriptomics predicts phenotypes of lupus patients. iScience, 26(10), 108042. https://doi.org/10.1016/j.isci.2023.108042
Li, Y., Yao, L., Lee, Y. A., Huang, Y., Merkel, P. A., Vina, E., Yeh, Y.-Y., Li, Y., Allen, J. M., Bian, J., & Guo, J. (2025). A fair machine learning model to predict flares of systemic lupus erythematosus. JAMIA Open, 8(4), ooaf072. https://doi.org/10.1093/jamiaopen/ooaf072
Munguía-Realpozo, P., Etchegaray-Morales, I., Mendoza-Pinto, C., Méndez-Martínez, S., Osorio-Peña, Á. D., Ayón-Aguilar, J., & García-Carrasco, M. (2023). Current state and completeness of reporting clinical prediction models using machine learning in systemic lupus erythematosus: A systematic review. Autoimmunity Reviews, 22(5), 103294. https://doi.org/10.1016/j.autrev.2023.103294
Paik, S., Shak, S., Tang, G., Kim, C., Baker, J., Cronin, M., Baehner, F. L., Walker, M. G., Watson, D., Park, T., Hiller, W., Fisher, E. R., Wickerham, D. L., Bryant, J., & Wolmark, N. (2004). A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. New England Journal of Medicine, 351(27), 2817–2826. https://doi.org/10.1056/NEJMoa041588
Progentec Diagnostics. (n.d.-a). aiSLE DX Flare Risk Index [Diagnostic test]. Progentec. https://www.progentec.com/aisle-dx-fri
Progentec Diagnostics. (n.d.-b). aiSLE MGMT: Lupus care-management platform [Software]. Progentec. https://www.progentec.com/aisle-mgmt
Research and Markets. (2025, January 16). Systemic lupus erythematosus (SLE) market forecast at $6.19 billion by 2034 [Market report]. GlobeNewswire. https://www.globenewswire.com/…
Virtue Market Research. (2025, December 16). Point-of-care testing for systemic lupus erythematosus (SLE) market [Market report]. OpenPR. https://www.openpr.com/news/4316397/the-global-point-of-care-testing-for-systemic-lupus

Appendix · 06 / 07

Technology & craft.

The tools the system is built with, and the design language it follows.

Apple Inc. (2024). Human interface guidelines. Apple Developer. https://developer.apple.com/design/human-interface-guidelines
Bernews. (2023, May 10). World Lupus Day being recognized today. Bernews. https://bernews.com/2023/05/world-lupus-day-wednesday-may-10/
Blueastro. (2025). Systemic lupus erythematosus: Guy with the typical butterfly rash in lupus, SLE, skin disease [Illustration]. iStock. https://www.istockphoto.com/…
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22.
R Core Team. (2024). R: A language and environment for statistical computing (4.x) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/
Schloerke, B., & Allen, J. (2024). plumber: An API generator for R [R package]. https://www.rplumber.io/
Siriseriwan, W. (2019). smotefamily: A collection of oversampling techniques for class imbalance based on SMOTE (1.3.1) [R package]. https://CRAN.R-project.org/package=smotefamily
The University of Sydney. (2023). University of Sydney visual identity guidelines. The University of Sydney.

Appendix · 07 / 07

Team & acknowledgements.

DATA3888 · 2026
The University of Sydney

Monarch - Developers

Manna Berry¹Development of Progression Model & Assistance with Backend

mber0347@uni.sydney.edu.au Faculty of Engineering J12, The University of Sydney, NSW 2006

Lezhi Lin¹Development of App Frontend & Presentation Slides

llin0935@uni.sydney.edu.au School of Mathematics and Statistics F07, The University of Sydney, NSW 2006 Australia

Udit Samant¹Development of Diagnosis Model & General App Backend

usam6049@uni.sydney.edu.au School of Computer Science J12, The University of Sydney, NSW 2006 Australia

Hadi Shafat¹Interdisciplinary Aspects Research & Assistance with Backend

hsha0153@uni.sydney.edu.au School of Computer Science J12, The University of Sydney, NSW 2006 Australia

Jillian Zhao¹Development of Treatment Model & Assistance with Backend & Background Research

yzha0369@uni.sydney.edu.au School of Computer Science J12, The University of Sydney, NSW 2006 Australia

Minh Hieu Tran⁶Assistance with Exploratory Data Analysis

mtra0191@uni.sydney.edu.au School of Computer Science J12, The University of Sydney, NSW 2006 Australia

Acknowledgements

We acknowledge the Gadigal of the Eora Nation, the Traditional Custodians of the land on which the University of Sydney stands, and pay our respects to Elders past and present.
This slide is submitted in partial fulfillment of the assessment requirements for DATA3888 Data Science Capstone at The University of Sydney. Our work rests on the work of open-source maintainers across R, Bioconductor, and the modelling libraries used here, as well as the DATA3888 teaching team for project structure, feedback, and course support.
We are extremely grateful to our advisors, Dr. Andy Tran and Elyna Lin, for all the guidance, thoughtful feedback, and steady support during both the workshops and consultations, throughout the project.
We thank fellow students Anina Xinyu Shu and Sicheng Chen for their valuable insights on the development of this project and the construction of the presentation.
We acknowledge the original data contributors and study participants behind the public GEO cohorts. Their shared expression and clinical metadata made the modelling, validation, and patient-level demonstrations possible.
We acknowledge the use of AI-assisted tools to support drafting, code iteration, interface refinement, and debugging. All AI-assisted outputs were reviewed, edited, and validated by the team, who remain responsible for the final analysis, design decisions, and implementation.

Monarch

Background and Motivations

SLE is complex and unpredictable

Current care is still reactive

Background

SLE is complex, visible in some moments, and hidden in others.

Background:Systemic Lupus Erythematosus

Previous Innovations: Better A three-stage machine learning pipelines

Methodology: modelling workflow

01 Data collection

02 Pre-processing & EDA

03 Gene selection

04 Machine learning

05 Performance evaluation

Methodology: how each model stage is done

Diagnosis

Progression

Treatment response

A Random Forest model leading the field

Monarch

Conclusions

Conclusion

Shared workflow

Primary limitation

Future work

Real-world integration

Preprocessing support

Gene panel

Meet Our Team

Thank you.

Full model performance.

Background & epidemiology.

Datasets & cohorts.

Modelling & evaluation.

Competitive landscape.

Technology & craft.

Team & acknowledgements.