Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Citation copied!