Prediction and Causation in EHRs
David Page
November 17, 2021, Wednesday, 3:00 PM - 4:00 PM EST
Some applications of machine learning in medicine need only accurate prediction, while others require accurate attribution of cause and effect. This talk provides some empirical and theoretical results of both types of applications, including showing the following. First occurrence of thousands of ICD codes can be predicted with average AUC above 0.7, and accuracies can be further significantly improved by using family histories constructed entirely automatically from de-identified patient data. Beneficial and harmful side effects of drugs can be identified accurately by machine learning, but only when algorithms are modified to consider unobserved and partially-observed confounders, especially time-varying confounders.
David Page is a professor of Biostatistics and Bioinformatics. He works on algorithms for data mining and machine learning, as well as their applications to biomedical data, especially de-identified electronic health records and high-throughput genetic and other molecular data. Of particular interest are machine learning methods for complex multi-relational data (such as electronic health records or molecules as shown) and irregular temporal data, and methods that find causal relationships or produce human-interpretable output (such as the rules for molecular bioactivity shown in green to the side).