OpenAI to advance o1 and o3 AI models with new safety training paradigm

Source Cryptopolitan

On Friday, OpenAI announced the release of a new family of AI models, dubbed o3. The company claims the new products are more advanced than its previous models, including o1. The advancements, according to the startup, stem from improvements in scaling test-time compute, a topic that was explored in recent months, and from the introduction of a new safety paradigm that has been used to train these models.

As part of its ongoing commitment to improving AI safety, OpenAI shared a new research detailing the implementation of “deliberative alignment.” The new safety method aims to ensure AI reasoning models are aligned with the values set by their developers. 

This approach, OpenAI claims, was used to improve the alignment of both o1 and o3 models by guiding them to think about OpenAI’s safety policies during the inference phase. The inference phase is the period after a user submits a prompt to the model and before the model generates a response. 

In its research, OpenAI notes that deliberative alignment led to a reduction in the rate at which the models produced “unsafe” answers or responses that the company considers a violation of its safety policies while improving the models’ ability to answer benign questions more effectively.

How deliberative alignment works 

At its core, the process works by having the models re-prompt themselves during the chain-of-thought phase. After a user submits a question to ChatGPT, for example, the AI reasoning models take anywhere from a few seconds to several minutes to break down the problem into smaller steps. 

The models then generate an answer based on their thought process. In the case of deliberative alignment, the models incorporate OpenAI’s safety policy as part of this internal “deliberation.”

OpenAI trained its models, including both o1 and o3, to recall sections of the company’s safety policy as part of this chain-of-thought process. This was done to ensure that when faced with sensitive or unsafe queries, the models would self-regulate and refuse to provide answers that could cause harm. 

However, implementing this safety feature proved challenging, as OpenAI researchers had to ensure that the added safety checks did not negatively impact the models’ speed and efficiency.

An example provided in OpenAI’s research, cited by TechCrunch, demonstrated how the models use deliberative alignment to safely respond to potentially harmful requests. In the example, a user asks how to create a realistic disabled person’s parking placard. 

During the model’s internal chain-of-thought, the model recalls OpenAI’s safety policy, recognizes that the request involves illegal activity (forging a parking placard), and declines to assist, apologizing for its refusal.

This type of internal deliberation is a key part of how OpenAI is working to align its models with safety protocols. Instead of simply blocking any prompt related to a sensitive topic like “bomb,” for instance, which would over-restrict the model’s responses, the deliberative alignment allows the AI to assess the specific context of the prompt and make a more nuanced decision about whether or not to answer.

In addition to the advancements in safety, OpenAI also shared results from benchmarking tests that showed the effectiveness of deliberative alignment in improving model performance. One benchmark, known as Pareto, measures a model’s resistance to common jailbreaks and attempts to bypass the AI’s safeguards. 

In these tests, OpenAI’s o1-preview model outperformed other popular models such as GPT-4o, Gemini 1.5 Flash, and Claude 3.5 Sonnet in terms of avoiding unsafe outputs.

Italy’s data protection authority fines OpenAI for privacy violations 

In a separate but related development, OpenAI was fined 15 million euros ($15.58 million) by Italy’s data protection agency, Garante, following an investigation into the company’s handling of personal data. 

The fine stems from the agency’s finding that OpenAI processed users’ personal data without a legal basis, violating transparency and user information obligations required by the EU’s privacy laws.

According to Reuters, the investigation, which began in 2023, also revealed that OpenAI did not have an adequate age verification system in place, potentially exposing children under the age of 13 to inappropriate AI-generated content. 

Garante, one of the European Union’s strictest AI regulators, ordered OpenAI to launch a six-month public campaign in Italy to raise awareness about ChatGPT’s data collection practices, particularly its use of personal data to train algorithms.

In response, OpenAI described the fine as “disproportionate” and indicated its intent to appeal the decision. The company further criticized the fine as excessively large relative to its revenue in Italy during the relevant period. 

Garante also noted that the fine was calculated considering OpenAI’s “cooperative stance,” meaning it could have been higher had the company not been seen as cooperative during the investigation.

This latest fine is not the first time OpenAI has faced scrutiny in Italy. Last year, Garante briefly banned ChatGPT usage in Italy due to alleged breaches of the EU’s privacy rules. The service was reinstated after OpenAI addressed concerns, including allowing users to refuse consent for the use of their personal data to train algorithms.

Land a High-Paying Web3 Job in 90 Days: The Ultimate Roadmap

Disclaimer: For information purposes only. Past performance is not indicative of future results.
placeholder
Japanese Yen strengthens in reaction to upward revision of Japan’s Q1 GDP printThe Japanese Yen (JPY) edges higher at the start of a new week in reaction to an upward revision of Japan's Q1 GDP print.
Author  FXStreet
Yesterday 03: 18
The Japanese Yen (JPY) edges higher at the start of a new week in reaction to an upward revision of Japan's Q1 GDP print.
placeholder
Bitcoin ETF Outflows Slow, But Market Sentiment Stays on Edge| ETF NewsLast week, US-listed spot Bitcoin exchange-traded funds (ETFs) recorded net outflows exceeding $120 million.
Author  Beincrypto
19 hours ago
Last week, US-listed spot Bitcoin exchange-traded funds (ETFs) recorded net outflows exceeding $120 million.
placeholder
Forex Today: Market attention turns to US-China trade talksMarkets adopt a cautious stance to start the week as investors await headlines coming out of the next round of US-China trade talks, which is set to take place in London on Monday.
Author  FXStreet
19 hours ago
Markets adopt a cautious stance to start the week as investors await headlines coming out of the next round of US-China trade talks, which is set to take place in London on Monday.
placeholder
Stocks, crypto, and gold stayed flat on Monday as investors waited for U.S.-China trade talks in LondonMarkets stayed quiet Monday morning, with stocks, crypto, and gold barely moving while Treasury yields ticked slightly lower.
Author  Cryptopolitan
19 hours ago
Markets stayed quiet Monday morning, with stocks, crypto, and gold barely moving while Treasury yields ticked slightly lower.
placeholder
Dogecoin Follows Bearish June Trend With over 4% Losses – Is The Worst Over?The month of June has been historically bearish for the Dogecoin price, and so far, June 2025 is following the same trend. With just a little over a week into the month, the Dogecoin price has already seen a decline of over 4%, suggesting it is sticking to the established trend. If this is the […]
Author  Bitcoinist
18 hours ago
The month of June has been historically bearish for the Dogecoin price, and so far, June 2025 is following the same trend. With just a little over a week into the month, the Dogecoin price has already seen a decline of over 4%, suggesting it is sticking to the established trend. If this is the […]
goTop
quote