Benjamin Lengerich Sample-Specific Models for Precision Medicine Degree Type: Ph.D. in Computer Science Advisor(s): Eric Xing Graduated: December 2020 Abstract: Modern applications of artificial intelligence are often characterized by training large machine learning (ML) models on large datasets. These datasets are composed of overlapping groups of samples, either explicitly (e.g. the large dataset is created by combining multiple datasets) or implicitly (e.g. the samples belong to latent sub-populations). Population models prefer weakly-predictive global patterns over highly-predictive localized effects, a problem because localized effects are critical to understanding complex processes such as in applications to computational biology (in which samples come from latent cell types) and precision medicine (in which patients come from latent disease subtypes). In this thesis, we propose that: The performance of intelligent computer systems can be improved by treating different samples as different tasks. This is especially helpful in domains such as computational biology and precision medicine, in which we care about understanding the highly specific context of each sample. We propose to solve this problem by estimating a collection of many small models. For large collections, each model is responsible for only a small number of samples, enabling simultaneous interpretability and accuracy. As we show in this thesis, this framework can be scaled to estimate different model parameters for every sample. This thesis begins by studying the challenges of characterizing real-world datawith population-level models. Next, we develop the methodology of PersonalizedRegression. Finally, we apply sample-specific inference to computational biologyand precision medicine by: (1) Identifying Discriminative Subtypes of Cancers from Histopathology Images and (2) Cell-Specific Transcriptomic Regulatory Network Inference. Thesis Committee: Eric P. Xing (Chair) Zico Kolter Ziv Bar-Joseph Manolis Kellis (Massachusetts Institute of Technology) Rich Carunana (Microsoft Research) Srinivasan Seshan, Head, Computer Science Department Martial Hebert, Dean, School of Computer Science Keywords: Personalized Machine Learning, Sample-Specific Models, Precision Medicine CMU-CS-20-139.pdf (27.67 MB) ( 103 pages) Copyright Notice