OpenAI’s reasoning model often ‘thinks’ in Chinese – No one can explain why

Source Cryptopolitan

Individuals who are using OpenAI have identified a vulnerability. Shortly after OpenAI published o1, its initial “reasoning” AI model, a peculiar behavior was observed. Apparently, when an inquiry is posed in English, the model occasionally begins “thinking” in a language other than English, such as Chinese or Persian. 

A user said, “[O1] randomly started thinking in Chinese halfway through.” In addition, a completely different user on X also said, “Why did [o1] randomly start thinking in Chinese?”

According to observations, when presented with a problem to resolve, o1 would begin its “thought” process, which involves a sequence of reasoning steps that lead to an answer. The final response of o1 would be in English if the query were written in that language.

Still, the model would carry out certain procedures in a different language before formulating its conclusion.

Notably, OpenAI has not provided an explanation for o1’s peculiar conduct, nor has it even acknowledged it. Therefore, what could be the cause of this?

Here are some AI professionals’ theories.

Hugging Face CEO Clément Delangue mentioned on X that reasoning models like o1 are trained on data sets with a large number of Chinese letters. 

Additionally, according to Ted Xiao, a researcher at Google DeepMind, organizations such as OpenAI use third-party Chinese data labeling services, and the transition to Chinese is an example of “Chinese linguistic influence on reasoning.”

Ted Xiao wrote in an X post, “AGI labs like OpenAI and Anthropic utilize 3P data labeling services for PhD-level reasoning data for science, math, and coding; for expert labor availability and cost reasons, many of these data providers are based in China.”

Apparently, during the training process, labels, which are also referred to as identifiers or annotations, assist models in comprehending and interpreting data.

 For instance, labels that are used to train an image recognition model may consist of captions that refer to each person, place, or object depicted in an image or markings that surround objects.

Additionally, research has demonstrated that biased classifications can result in biased models. As an example, the average annotator is more inclined to label phrases in African-American Vernacular English (AAVE). 

This is known as the informal grammar used by certain Black Americans as toxic. As a result, AI toxicity detectors that have been trained on the labels perceive AAVE as excessively toxic.

Still, the theory of o1 Chinese data labeling is not accepted by other experts. They emphasize that o1 is equally likely to transition to Hindi, Thai, or a language other than Chinese while attempting to formulate a solution.

Rather, these experts argue that o1 and other reasoning models may be using the most efficient languages to accomplish an objective.

To that end, Matthew Guzdial, an AI researcher, said, “The model doesn’t know what language is or that languages are different.” This is because tokens, similar to labeling, have the potential to impose biases. 

In particular, various word-to-token translators presume that a space in a sentence indicates a new word. This is regardless of the fact that not all languages use spaces to separate words.

However, Luca Soldaini, a research scientist at the nonprofit Allen Institute for AI, emphasized that it is impossible to determine with certainty. He stated, “This type of observation on a deployed AI system is impossible to back up due to the opaque nature of these models […] It is one of the numerous instances in which the importance of transparency in the construction of AI systems is underscored.”

OpenAI troubles

The year 2024 was nothing short of a rollercoaster for OpenAI. The company and its CEO, Sam Altman, began the year by being sued by Elon Musk. He argued that the business turned from its initial nonprofit objective to emphasize profits over public benefit.

In the last year, eight newspapers in the United States, including the New York Daily News, the Chicago Tribune, and the Denver Post, have sued OpenAI and Microsoft. They accused the firm of using millions of copyrighted publications to train AI chatbots without permission or payment. They alleged that the technique violated their intellectual property rights.

Also, Mira Murati, OpenAI’s Chief Technology Officer, announced her departure. This was a key time because her technological skills were critical to the Company’s development.

Moreover, OpenAI encountered several difficulties with ChatGPT, such as occasional outages, glitches that resulted in inaccurate or nonsensical responses from the chatbot, and concerns regarding user privacy. There were also instances in which the AI generated biased or offensive content.

A Step-By-Step System To Launching Your Web3 Career and Landing High-Paying Crypto Jobs in 90 Days.

Disclaimer: For information purposes only. Past performance is not indicative of future results.
placeholder
Analysts Highlight 4 Reasons Why ETH Price Could Rebound Strongly in MayEthereum (ETH) has declined for five consecutive months. However, it enters May with rising optimism.
Author  Beincrypto
May 07, Wed
Ethereum (ETH) has declined for five consecutive months. However, it enters May with rising optimism.
placeholder
Dogecoin’s Price Coils In A Key Bullish Chart Pattern, A Rebound On The Horizon?During the recent bullish market action in late April, Dogecoin witnessed a notable upward movement, rising to the $0.18 mark with robust momentum and volume.
Author  Bitcoinist
May 08, Thu
During the recent bullish market action in late April, Dogecoin witnessed a notable upward movement, rising to the $0.18 mark with robust momentum and volume.
placeholder
Ethereum Price Ready to Surge—$2,000 Level Could Be Within ReachEthereum price started a fresh increase above the $1,800 zone. ETH is now rising and attempting a move above the $1,850 resistance. Ethereum started a fresh recovery wave above the $1,820 resistance.
Author  NewsBTC
May 08, Thu
Ethereum price started a fresh increase above the $1,800 zone. ETH is now rising and attempting a move above the $1,850 resistance. Ethereum started a fresh recovery wave above the $1,820 resistance.
placeholder
Sui Price Forecast: SUI bulls aim for 15% gains as open interest and bullish bets increase among tradersSui (SUI) price extends recent gains, soaring10% higher at the time of writing on Thursday and approaching its key resistance level at $3.65.
Author  FXStreet
May 08, Thu
Sui (SUI) price extends recent gains, soaring10% higher at the time of writing on Thursday and approaching its key resistance level at $3.65.
placeholder
Ethereum Price Explodes Past $2,200 with 25% Surge—Momentum Builds FastEthereum price started a fresh surge above the $2,000 zone. ETH is now up over 25% and consolidating gains near the $2,200 zone. Ethereum started a fresh surge above the $2,000 resistance.
Author  NewsBTC
May 09, Fri
Ethereum price started a fresh surge above the $2,000 zone. ETH is now up over 25% and consolidating gains near the $2,200 zone. Ethereum started a fresh surge above the $2,000 resistance.
goTop
quote