Security Stop Press : The Threat Of Sleeper Agents In LLMs

AI company Anthropic has published a research paper highlighting how large language models (LLMs) can be subverted so that at a certain point, they start emitting maliciously crafted source code.

For example, this could involve training a model to write secure code when the prompt states that the year is 2024 but insert exploitable code when the stated year is 2025.

The paper likened the backdoored behaviour to having a kind of “sleeper agent” waiting inside an LLM. With these kinds of backdoors not yet fully understood, the researchers have identified them as a real threat and have highlighted how detecting and removing them is likely to be very challenging.

Posted in

Paul Stradling

MICROSOFT OFFICE 365
YOUR COMPLETE OFFICE IN THE CLOUD

Bringing together everyone's favourite productivity tools with the benefits of cloud-based communication and collaboration, Microsoft have developed a platform that is both technically & commercially-sound for businesses of any shape.