Doctoral Thesis Proposal - Joshua Nathaniel Williams

— 3:00pm

Location:
In Person and Virtual - ET - Reddy Conference Room, Gates Hillman 4405 and Zoom

Speaker:
JOSHUA NATHANIEL WILLIAMS, Ph.D. Student, Computer Science Department, Carnegie Mellon University
https://jnwilliams.github.io/

It is well understood that generative models can exhibit representational biases across various social groups. Without explicit conditioning, the vast majority of airplane pilots will be male and the vast majority of bank tellers will be female. While much of the existing work focuses on hypothesis-driven investigations into these biases, in this proposed thesis, we focus on discovery of new and unexpected patterns in how these models represent people. We show through the lens of counterfactual explainability – processes that find minimal input changes that alter a model's output – a framework for hypothesis generation in order to reveal surprising patterns in how text-to-image models align textual input with socially meaningful group memberships.

We establish a formal, mathematical distinction between counterfactual explanations and adversarial examples, which enables the development of novel distance metrics and optimization strategies for identifying human-readable counterfactual prompts. Our approach employs discrete optimization techniques, including an adapter for computing gradients across embeddings from diverse models. Furthermore, we evaluate discrete prompt optimization methods and demonstrate that additional regularization is crucial for generating coherent and human-like prompts. This work lays the foundation for more nuanced understanding of representational biases in generative models and offers tools for their systematic exploration.

Thesis Committee

Zico Kolter (Chair)
Aditi Raghunathan
Hoda Heidari
Sarah Laszlo (Visa)
 

Additional Information

In Person and Zoom Participation.  See announcement.


Add event to Google
Add event to iCal