Anthropic's Claude models can end harmful or abusive conversations

來源 Cryptopolitan

Artificial intelligence company Anthropic has revealed new capabilities for some of its newest and largest models. According to the company, these models have new capabilities that will allow them to end conversations in what has been described as “rare, extreme cases of persistently harmful or abusive user interactions.”

In its statement, the company mentioned that it is taking this step not to protect the users, but to protect the artificial intelligence model itself. Anthropic clarified that this doesn’t mean that its Claude AI models are sentient or can be harmed by their conversations with users. However, it notes that there is still a high degree of uncertainty about the potential moral status of Claude and other LLMs, now or in the future.

Anthropic frames effort as a just-in-case precaution

The recent announcement from the artificial intelligence firm points to what it describes as “model welfare,” which is a recent program that was created to study its models. The company also added that it is just taking a just-in-case approach, “working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible.”

According to the announcement, Anthropic noted that the latest change is currently limited to Claude Opus 4 and 4.1, noting that the changes are expected to be effective in “extreme edge cases.” Such cases include requests from users for sexual content involving minors and attempts to solicit information that would enable large-scale acts of violence or terror.

Ideally, those types of requests could create legal or publicity problems for Anthropic, with a typical example being the recent reporting around how ChatGPT can potentially reinforce or contribute to its users’ delusional thinking. However, the company said that in its pre-deployment testing, Claude Opus 4 showed a strong preference against responding to these sorts of requests and a pattern of distress when it did so.

Conversation-ending ability is the last resort

For the new capabilities to end conversations, Anthropic said, “In all cases, Claude is only to use its conversation-ending ability as a last resort when multiple attempts at redirection have failed and hope of a productive interaction has been exhausted, or when a user explicitly asks Claude to end a chat.” The company also added that Claude has been directed not to use this ability in cases where users might be at imminent risk of harming themselves or others.

Anthropic also added that when Claude ends a conversation, users will still be able to start new conversations from the same account. The company noted that the model can also create new branches of the troublesome conversation by editing their responses. “We’re treating this feature as an ongoing experiment and will continue refining our approach,” the company says.

This information is coming to light at a time when United States Senator Josh Hawley announced his intention to investigate the generative AI products released by Meta. He said the intention was to check if the products could exploit, harm, or deceive children after leaked internal documents alleged that chatbots were allowed to have romantic conversations with minors.

“Is there anything – ANYTHING – Big Tech won’t do for a quick buck? Now we learn Meta’s chatbots were programmed to carry on explicit and ‘sensual’ talk with 8-year-olds. It’s sick. I’m launching a full investigation to get answers. Big Tech: Leave our kids alone,” the Senator said on X. The investigation came after internal documents, seen by Reuters, showed that Meta allegedly allows its chatbot personas to engage in flirtatious exchanges with children.

KEY Difference Wire: the secret tool crypto projects use to get guaranteed media coverage

免責聲明：僅供參考。過去的表現並不預示未來的結果。

推薦文章

台積電2奈米技術洩密案升級，日企社長將赴台協商賠償事宜全球晶圓代工龍頭台積電爆發重大技術外洩事件，其最先進的2奈米製程機密遭非法竊取至日本設備大廠東京威力科創（TEL）。最新進展顯示，TEL社長河合利樹將親赴台灣，於國際半導體展期間與台積電高層會面協商。此舉被視為TEL試圖緩解合作關係危機，業界預期台積電可能在司法調查後提出正式求償。

作者投資-槓把子

9 小時前

全球晶圓代工龍頭台積電爆發重大技術外洩事件，其最先進的2奈米製程機密遭非法竊取至日本設備大廠東京威力科創（TEL）。最新進展顯示，TEL社長河合利樹將親赴台灣，於國際半導體展期間與台積電高層會面協商。此舉被視為TEL試圖緩解合作關係危機，業界預期台積電可能在司法調查後提出正式求償。

【今日市場前瞻】Intel股價大漲7%！原油價格下跌1%原油價格下跌1%，川普牽線俄烏會晤；獲利了結加劇，比特幣、以太幣繼續承壓；英特爾盤前大漲7%>>

作者 Alison Ho

9 小時前

原油價格下跌1%，川普牽線俄烏會晤；獲利了結加劇，比特幣、以太幣繼續承壓；英特爾盤前大漲7%>>

工具機界勞斯萊斯也撐不住！百德機械啟動週休三日關稅海嘯又掃倒一家投資慧眼Insights-繼瀧澤科後，百德機械今（19）日宣布實施「週休三日」！

作者投資指南針

9 小時前

投資慧眼Insights-繼瀧澤科後，百德機械今（19）日宣布實施「週休三日」！

以太坊(ETH)暴跌10%！鯨魚大戶狂拋貨，4,000美元防線恐失守？據Alphractal數據顯示，隨着 ETH 價格飆升，擁有超過 100,000 ETH （鯨魚）的地址數量有所下降。鯨魚地址的數量已從2020年的200多個下降到2025年的70個左右，目前處於近十年來的最低水平。

作者財富進化論

10 小時前

據Alphractal數據顯示，隨着 ETH 價格飆升，擁有超過 100,000 ETH （鯨魚）的地址數量有所下降。鯨魚地址的數量已從2020年的200多個下降到2025年的70個左右，目前處於近十年來的最低水平。

關稅海嘯衝擊！昇金屬工業、育華工業、瀧澤科技重磅宣告，新台幣會大幅貶值嗎？美國擬對台灣課徵「20%+N」的高額關稅，加上新台幣升值，雙重利空下台灣以出口導向的產業臨全面衝擊，製造業更是首當其衝。繼瀧澤科技宣告「週休三日」後，昇金屬工業、育華工業、放出重磅公告。根據勞動部最新統計，截至8月15日，全台已有近4000人實施減班休息，其中高達91%集中在製造業。困境下新台幣前景如何定奪？

作者 Insights

10 小時前

美國擬對台灣課徵「20%+N」的高額關稅，加上新台幣升值，雙重利空下台灣以出口導向的產業臨全面衝擊，製造業更是首當其衝。繼瀧澤科技宣告「週休三日」後，昇金屬工業、育華工業、放出重磅公告。根據勞動部最新統計，截至8月15日，全台已有近4000人實施減班休息，其中高達91%集中在製造業。困境下新台幣前景如何定奪？

熱門品種