Anthropic's Claude models can end harmful or abusive conversations

来源 Cryptopolitan

Artificial intelligence company Anthropic has revealed new capabilities for some of its newest and largest models. According to the company, these models have new capabilities that will allow them to end conversations in what has been described as “rare, extreme cases of persistently harmful or abusive user interactions.”

In its statement, the company mentioned that it is taking this step not to protect the users, but to protect the artificial intelligence model itself. Anthropic clarified that this doesn’t mean that its Claude AI models are sentient or can be harmed by their conversations with users. However, it notes that there is still a high degree of uncertainty about the potential moral status of Claude and other LLMs, now or in the future.

Anthropic frames effort as a just-in-case precaution

The recent announcement from the artificial intelligence firm points to what it describes as “model welfare,” which is a recent program that was created to study its models. The company also added that it is just taking a just-in-case approach, “working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible.”

According to the announcement, Anthropic noted that the latest change is currently limited to Claude Opus 4 and 4.1, noting that the changes are expected to be effective in “extreme edge cases.” Such cases include requests from users for sexual content involving minors and attempts to solicit information that would enable large-scale acts of violence or terror.

Ideally, those types of requests could create legal or publicity problems for Anthropic, with a typical example being the recent reporting around how ChatGPT can potentially reinforce or contribute to its users’ delusional thinking. However, the company said that in its pre-deployment testing, Claude Opus 4 showed a strong preference against responding to these sorts of requests and a pattern of distress when it did so.

Conversation-ending ability is the last resort

For the new capabilities to end conversations, Anthropic said, “In all cases, Claude is only to use its conversation-ending ability as a last resort when multiple attempts at redirection have failed and hope of a productive interaction has been exhausted, or when a user explicitly asks Claude to end a chat.” The company also added that Claude has been directed not to use this ability in cases where users might be at imminent risk of harming themselves or others.

Anthropic also added that when Claude ends a conversation, users will still be able to start new conversations from the same account. The company noted that the model can also create new branches of the troublesome conversation by editing their responses. “We’re treating this feature as an ongoing experiment and will continue refining our approach,” the company says.

This information is coming to light at a time when United States Senator Josh Hawley announced his intention to investigate the generative AI products released by Meta. He said the intention was to check if the products could exploit, harm, or deceive children after leaked internal documents alleged that chatbots were allowed to have romantic conversations with minors.

“Is there anything – ANYTHING – Big Tech won’t do for a quick buck? Now we learn Meta’s chatbots were programmed to carry on explicit and ‘sensual’ talk with 8-year-olds. It’s sick. I’m launching a full investigation to get answers. Big Tech: Leave our kids alone,” the Senator said on X. The investigation came after internal documents, seen by Reuters, showed that Meta allegedly allows its chatbot personas to engage in flirtatious exchanges with children.

KEY Difference Wire: the secret tool crypto projects use to get guaranteed media coverage

免责声明:仅供参考。 过去的表现并不预示未来的结果。
placeholder
【今日市场前瞻】Intel股价大涨7%!原油价格下跌1%原油价格下跌1%,特朗普牵线俄乌会晤;获利了结加剧,比特币、以太币继续承压;英特尔盘前大涨7%>>
作者  Alison Ho
9 小时前
原油价格下跌1%,特朗普牵线俄乌会晤;获利了结加剧,比特币、以太币继续承压;英特尔盘前大涨7%>>
placeholder
人民币兑美元汇率将升破7?美中贸易协议前景乐观德意志银行、瑞银等机构已将人民币兑美元汇率预测上调至接近7。
作者  Tony Chou
10 小时前
德意志银行、瑞银等机构已将人民币兑美元汇率预测上调至接近7。
placeholder
特朗普牵线俄乌会晤,和谈协议将达成?欧元大行情一触即发!若俄乌达成协议,短期将提振欧元/美元。但未来涨势能否持续,还要看美联储9月降息。
作者  Alison Ho
11 小时前
若俄乌达成协议,短期将提振欧元/美元。但未来涨势能否持续,还要看美联储9月降息。
placeholder
中国股市创10年新高!外资加速流入,未来还能涨超10%?随着美中达成贸易协议、以及政府加大对股市支持力度,中国股票不断上涨。
作者  Tony Chou
12 小时前
随着美中达成贸易协议、以及政府加大对股市支持力度,中国股票不断上涨。
placeholder
8.19精选策略分享:黄金、纳斯达克100指数、英特尔(INTC)、台积电(TSM)技术分析“美俄乌”三方会晤可能性提高,但美元并未因此下滑,因市场焦点转向关注美国零售商业绩表现及杰克森霍尔(Jackson Hole)全球央行年。尽管上周的数据显示7月生产者价格通胀强于预期,但投资者仍坚持联准会将于下个月降息的押注,然而比特币、以太币等风险资产已率先下跌,预计美联储货币政策不确定性将令投资者情绪趋向谨慎,市场波动幅度收窄,等待方向选择。
作者  Insights
14 小时前
“美俄乌”三方会晤可能性提高,但美元并未因此下滑,因市场焦点转向关注美国零售商业绩表现及杰克森霍尔(Jackson Hole)全球央行年。尽管上周的数据显示7月生产者价格通胀强于预期,但投资者仍坚持联准会将于下个月降息的押注,然而比特币、以太币等风险资产已率先下跌,预计美联储货币政策不确定性将令投资者情绪趋向谨慎,市场波动幅度收窄,等待方向选择。
goTop
quote