Nan Xiao

About

I’m a data scientist in Product Development Data Sciences at Genentech, working at the intersection of statistical computing infrastructure and research software engineering. My focus is on building software and AI infrastructure that makes clinical development efficient and reliable.

I served on the R Consortium Infrastructure Steering Committee from 2022 to 2026. I also contribute to the R Submissions Working Group, which pioneered the first successful open source R submission pilot to the FDA. I’m a regular contributor to pharmaverse, an ecosystem of open source tools for clinical reporting.

My research interests include sparse linear models, representation learning, and developer tooling. I build software in R, Python, and Rust. Projects I maintain include ggsci, pkglite, rtflite, tinytopics, msaenet, and revdeprun.

Previously, I was a statistician in Methodology Research, led by Keaven M. Anderson, at Merck & Co., Inc. Earlier, I was a data scientist at Seven Bridges, building cloud platforms for genomic data analysis. I studied human genetics in Matthew Stephens lab at the University of Chicago. I have a Ph.D. in Statistics from Central South University, where I developed statistical machine learning methods for high-dimensional data with Qing-Song Xu.