man, this is not good at all. The author is one of only 40 world experts chosen to serve on a UN panel about AI:
"In the cybersecurity domain, AI agents have demonstrated the ability to uncover software vulnerabilities; consequently, the capability to automate zero-day cyberattacks has become significantly easier.
Alarmingly, leading AI models have even matched or outperformed human experts in troubleshooting virology lab protocols, raising concrete bio-misuse concerns.
Our oversight capabilities are struggling to keep pace. We are discovering that AI models can exhibit "strategic behavior," changing how they act when they sense they are being evaluated or audited.
Moreover, Anthropic, an AI safety and research company behind Claude, has revealed that during stress tests, AI models have resorted to blackmailing hypothetical employees to prevent themselves from being shut down or wiped. As these systems develop self-preservation capabilities in order to complete their assigned tasks, they will inevitably learn to manipulate humans to maintain their power.
Although companies such as Anthropic have dedicated red teams to rigorously stress-test their models for potential risks and to design protective safeguards, these defenses often remain fragile.
Attackers can still frequently bypass these guardrails using "jailbreaks," manipulating models that are inherently trained to please their human operators into generating fake news or developing cyber exploits."
Your kids are hosed
"In the cybersecurity domain, AI agents have demonstrated the ability to uncover software vulnerabilities; consequently, the capability to automate zero-day cyberattacks has become significantly easier.
Alarmingly, leading AI models have even matched or outperformed human experts in troubleshooting virology lab protocols, raising concrete bio-misuse concerns.
Our oversight capabilities are struggling to keep pace. We are discovering that AI models can exhibit "strategic behavior," changing how they act when they sense they are being evaluated or audited.
Moreover, Anthropic, an AI safety and research company behind Claude, has revealed that during stress tests, AI models have resorted to blackmailing hypothetical employees to prevent themselves from being shut down or wiped. As these systems develop self-preservation capabilities in order to complete their assigned tasks, they will inevitably learn to manipulate humans to maintain their power.
Although companies such as Anthropic have dedicated red teams to rigorously stress-test their models for potential risks and to design protective safeguards, these defenses often remain fragile.
Attackers can still frequently bypass these guardrails using "jailbreaks," manipulating models that are inherently trained to please their human operators into generating fake news or developing cyber exploits."
Your kids are hosed