Modern high-throughput biological assays let us ask detailed questions about how diseases operate, and promise to let us personalize therapy. Our intuition about what the answers “should” look like in high dimensions is very poor, so careful data processing is essential. When documentation of such processing is absent or incomplete, we must apply “forensic bioinformatics” to work backwards from the raw data and the reported results to infer what the methods must have been. Such explorations occasionally reveal errors. We discuss some cases where errors eventually affected patient care. The most common errors we uncover are simple ones, often involving mislabeling of rows, columns, or variables. These errors are easy to make, but if documentation is adequate, they may be easy to fix. Incomplete documentation is, however, pervasive in much of the scientific literature.
Keith earned both his undergraduate and graduate degrees in statistics from Rice University (Ph.D. ’94). After 2 years at Los Alamos and 4 as Rice faculty, he joined the bioinformatics group at MD Anderson, where he remained until 2018 as the Ransom Horne, Jr, Professor of Cancer Research.
Keith has become an advocate for analytical reproducibility in biomedicine. His work has led to new guidance by the FDA, an Institute of Medicine (IOM) report on how “Omics” assays should be used in guiding patient care, new guidelines from the NCI re trials they fund, and helped prompt the NIH’s 2016 Rigor and Reproducibility initiative.
Keith’s work has been featured on 60 Minutes and the front page of the New York Times. He is a Fellow of the American Statistical Association. In 2019, Rice’s School of Engineering named him one of their “Outstanding Alumni”.