Top
Abstract
Setup
Threat Model
Prompt Injections
Results
Conclusion
References
Cite
Adaptive Attacks
on Trusted Monitors
Subvert AI Control Protocols
Read on arXiv
2510.09462
Mikhail Terekhov
1,2,3
*
Alexander Panfilov
4,5,6
*
Daniil Dzenhaliou
3
*
Caglar Gulcehre
2,3
Maksym Andriushchenko
4,5,6
†
Ameya Prabhu
6,7
†
Jonas Geiping
4,5,6
†
*
Equal contribution
†
Equal advising
1
2
3
4
5
6
7
Citation copied!