Programmatic Supervision for Machine Learning
Alex Ratner
August 19, 2020, Wednesday, 3:00 PM - 4:00 PM EDT
Abstract
One of major bottlenecks in the development and deployment of AI applications is the need for the massive labeled training datasets that drive modern ML approaches today. These training datasets traditionally are often labeled by hand at great time and monetary expense, and often cannot be hand-labeled practically at all due to privacy, expertise, and or rate-of-change requirements in real world settings like healthcare and more.

This talk will cover a range of *programmatic* (often called "weak supervision") approaches to building, labeling, augmenting, and structuring training datasets, as well as the broader effects on end-to-end ML and AI application development. Specifically, this talk will cover techniques around programmatic labeling- such as the data programming and Snorkel approaches; data augmentation techniques for augmenting datasets with transformed copies of data to increase model robustness; data structuring or "slicing" techniques for highlighting, monitoring, and enabling models to attend to critical and/or difficult subsets of the data; and more key techniques around training data management.

More broadly, this talk will address how these new programmatic approaches lead to a whole new end-to-end ML/AI application development process. Using the example of Snorkel Flow, a new platform for this process, I will cover these ideas and how they extend to model training, monitoring and analysis, and the feedback loops that lead to actionable modification or extension of the programmatic supervision approaches, leading more broadly to a more iterative and error analysis-driven development and deployment process for ML and AI applications overall.
Bio
Alex Ratner holds a PhD in Computer Science from Stanford where he was advised by Christopher Re. His research focuses on applying data management and statistical learning techniques to emerging machine learning workflows, such as creating and managing training data, and applying this to real-world problems in medicine, knowledge base construction, and more. He leads the Snorkel project (snorkel.stanford.edu), which has been deployed at large technology companies, academic labs, and government agencies. He is co-founder and CEO at Snorkel AI and is Assistant Professor at University of Washington.