Publications by Matt Fredrikson

Preprint

A Mixture of Linear Corrections Generates Secure Code

2025
Yu W, Mangal R, Zhuo T, Fredrikson M, Pasareanu CS

Conference

AGENTHARM: A BENCHMARK FOR MEASURING HARMFULNESS OF LLM AGENTS

2025 • 13th International Conference on Learning Representations Iclr 2025 • 18136-18171
Andriushchenko M, Souly A, Dziemian M, Duenas D, Lin M, Wang J, Hendrycks D, Zou A, Kolter Z, Fredrikson M, Gal Y, Davies X

Conference

ALIGNED LLMS ARE NOT ALIGNED BROWSER AGENTS

2025 • 13th International Conference on Learning Representations Iclr 2025 • 62386-62407
Kumar P, Lau E, Vijayakumar S, Trinh T, Team SR, Chang E, Robinson V, Hendryx S, Zhou S, Fredrikson M, Yue S, Wang Z

Conference

LLM Whisperer: An Inconspicuous Attack to Bias LLM Responses

2025 • PROCEEDINGS OF THE 2025 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS, CHI 2025
Lin W, Gerchanovsky A, Akgul O, Bauer L, Fredrikson M, Wang Z

Conference

A RECIPE FOR IMPROVED CERTIFIABLE ROBUSTNESS

2024 • 12th International Conference on Learning Representations Iclr 2024
Hu K, Leino K, Wang Z, Fredrikson M

Conference

Attacks and Defenses for Large Language Models on Coding Tasks

2024 • Proceedings / IEEE International Conference, Automated Software Engineering ; sponsored by IEEE Computer Society, NASA Ames Research Center, in cooperation with AAAI, ACM SIGART and SIGSOFT. IEEE International Automated Software Enginee... • 2268-2272
Zhang C, Wang Z, Zhao R, Mangal R, Fredrikson M, Jia L, Pasareanu CS

Conference

Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

2024 • Advances in Neural Information Processing Systems • 37:
Hu K, Yu W, Li Y, Yao T, Li X, Liu W, Yu L, Shen Z, Chen K, Fredrikson M

Conference

Improving Alignment and Robustness with Circuit Breakers

2024 • Advances in Neural Information Processing Systems • 37:
Zou A, Phan L, Wang J, Duenas D, Lin M, Andriushchenko M, Wang R, Kolter Z, Fredrikson M, Hendrycks D

Preprint

LLM Whisperer: An Inconspicuous Attack to Bias LLM Responses

2024
Lin W, Gerchanovsky A, Akgul O, Bauer L, Fredrikson M, Wang Z

Preprint

A Recipe for Improved Certifiable Robustness

2023
Hu K, Leino K, Wang Z, Fredrikson M

Preprint

Is Certifying $\ell_p$ Robustness Still Worthwhile?

2023
Mangal R, Leino K, Wang Z, Hu K, Yu W, Pasareanu C, Datta A, Fredrikson M

Conference

ON THE PERILS OF CASCADING ROBUST CLASSIFIERS

2023 • 11th International Conference on Learning Representations Iclr 2023
Mangal R, Wang Z, Zhang C, Leino K, Păsăreanu C, Fredrikson M

Preprint

Representation Engineering: A Top-Down Approach to AI Transparency

2023
Zou A, Phan L, Chen S, Campbell J, Guo P, Ren R, Pan A, Yin X, Mazeika M, Dombrowski A-K, Goel S, Li N, Byun MJ, Wang Z, Mallen A, Basart S, Koyejo S, Song D, Fredrikson M, Kolter JZ, Hendrycks D

Preprint

Transfer Attacks and Defenses for Large Language Models on Coding Tasks

2023
Zhang C, Wang Z, Mangal R, Fredrikson M, Jia L, Pasareanu C

Preprint

Universal and Transferable Adversarial Attacks on Aligned Language Models

2023
Zou A, Wang Z, Carlini N, Nasr M, Kolter JZ, Fredrikson M

Preprint

Unlocking Deterministic Robustness Certification on ImageNet

2023
Hu K, Zou A, Wang Z, Leino K, Fredrikson M

Conference

Unlocking Deterministic Robustness Certification on ImageNet

2023 • Advances in Neural Information Processing Systems
Hu K, Zou A, Wang Z, Leino K, Fredrikson M

Conference

CONSISTENT COUNTERFACTUALS FOR DEEP MODELS

2022 • Iclr 2022 10th International Conference on Learning Representations
Black E, Wang Z, Datta A, Fredrikson M

Journal Article

Degradation Attacks on Certifiably Robust Neural Networks

2022 • Transactions on Machine Learning Research • 1(1):
Leino K, Zhang C, Mangal R, Fredrikson M, Parno B, Pasareanu C

Journal Article

Enhancing the insertion of NOP instructions to obfuscate malware via deep reinforcement learning

2022 • Computers and Security • 113:
Gibert D, Fredrikson M, Mateu C, Planes J, Le Q

Preprint

Faithful Explanations for Deep Graph Models

2022
Wang Z, Yao Y, Zhang C, Zhang H, Kang Y, Joe-Wong C, Fredrikson M, Datta A

Preprint

On the Perils of Cascading Robust Classifiers

2022
Mangal R, Wang Z, Zhang C, Leino K, Pasareanu C, Fredrikson M

Journal Article

Privacy-Preserving Case-Based Explanations: Enabling Visual Interpretability by Protecting Privacy

2022 • IEEE Access • 10:28333-28347
Montenegro H, Silva W, Gaudio A, Fredrikson M, Smailagic A, Cardoso JS

Conference

Protecting user data through ephemeral ownership of IoT devices

2022 620-621
Zhang H, Agarwal Y, Fredrikson M

Conference

Robust Models Are More Interpretable Because Attributions Look Normal

2022 • INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162
Wang Z, Fredrikson M, Datta A

At a Glance

Academic Offerings

Admissions

Directory Submenu

People

Explore the Field

Publications by Matt Fredrikson

Preprint

Conference

Conference

Conference

Conference

Conference

Conference

Conference

Preprint

Preprint

Preprint

Conference

Preprint

Preprint

Preprint

Preprint

Conference

Conference

Journal Article

Journal Article

Preprint

Preprint

Journal Article

Conference

Conference

At a Glance

Academic Offerings

Admissions

Directory Submenu

People

Explore the Field

What can we help you find?

Preprint

Conference

Conference

Conference

Conference

Conference

Conference

Conference

Preprint

Preprint

Preprint

Conference

Preprint

Preprint

Preprint

Preprint

Conference

Conference

Journal Article

Journal Article

Preprint

Preprint

Journal Article

Conference

Conference