Big Data, Big Bias, Bad AI
Leo Anthony Celi
November 3, 2023, Friday, 2:00 PM - 3:00 PM EDT
The application of artificial intelligence in healthcare requires a team science approach. A diverse set of expertise, perspectives and lived experiences are required to understand the various ways bias lurks in the data - from bias introduced by sampling selection (who made it to the database, who didn't, and what's the impact on downstream models), variation in the frequency of measurement that is not explained by the disease or patient phenotype (aka "shortcut" features in medical images), technology that performs differently across patient subgroups (e.g. pulse oximetry, wearable sensors optimized around fit individuals), etc. Data bias is the roadblock to realizing the promise of machine learning. Algorithmic bias is not just about evaluating model performance across patient subgroups post hoc. The goal is to ascertain that the model does not learn from features that should not affect decision making. Offering chemotherapy should not depend on whether a patient is on Medicaid or has a private insurance, predicting job performance should not be informed by the gender of the applicant, optimizing treatment for sepsis should be not be confounded by the use of infrared sensing technology. This is much easier said than done because of the discovery that computers can easily learn sensitive attributes that the human eye does not see. Using real world data to evaluate the models makes this extremely challenging. Excellent model accuracy means existing outcome disparities are fully encoded in the algorithms.
As clinical research director and principal research scientist at the MIT Laboratory for Computational Physiology (LCP), and as a practicing intensive care unit (ICU) physician at the Beth Israel Deaconess Medical Center (BIDMC), Leo brings together clinicians and data scientists to support research using data routinely collected in the process of care. His group built and maintains the publicly-available Medical Information Mart for Intensive Care (MIMIC) database and the Philips-MIT eICU Collaborative Research Database, with more than 20,000 users from around the world. In addition, Leo is one of the course directors for HST.936 - global health informatics to improve quality of care, and HST.953 - collaborative data science in medicine, both at MIT. He is an editor of the textbook for each course, both released under an open access license. "Secondary Analysis of Electronic Health Records" has been downloaded more than a million times, and has been translated to Mandarin, Spanish, Korean and Portuguese. Leo has spoken in more than 35 countries across 6 continents about the value of data and learning in health systems.