From Openia's O3 to Deepseek's R1: How Simulated Thinking will make LLM think deeper

Large language models (LLM) have developed significantly. What started as simple tools for generating and translations of text is now used in research, decision -making and difficult problems solving. The key factor in this shift is the growing ability of LLMS more systematically to think about problems, evaluating more options and dynamically refining their responses. Rather than just predict another word in the sequence, these models can now perform structured reasoning, which is more efficient in solving complex tasks. Leading models such as OpenIi O3, Google Gemini and Deepseek’s R1 integrate these abilities to improve their ability to process and analyze information more efficiently.

Understanding of simulated thinking

People naturally analyze different options before decision -making. Whether we are planning a holiday or solving the problem, we often simulate different plans in our minds to evaluate more factors, consider the advantages and disadvantages and blame our choices. Scientists integrate this ability of LLMS to improve their thinking skills. Here, simulated thinking basically refers to the ability of LLMS to systematically think before creating an answer. This is, unlike simply loading responsibility from stored data. A useful analogy is the solution of a mathematical problem:

The basic AI can recognize the pattern and quickly generate the answer without verifying it.
AI using simulated thinking would work through steps, checking the mistakes and confirmed its logic in front of Livor.

String thought out: learning AI to think in steps

If LLM has to perform simulated thinking as humans, it must be able to divide a complex problem. Here, the technician of thoughtful (COT) plays a key role.

COT is a challenge that leads LLMS to methodically work problems. Instead of jumping to conclusions, this structured process of thinking allows LLM to divide complex problems into simpler, manageable steps and solve them step by step.

For example, when solving a verbal problem in mathematics:

Basic AI could make the walls to care before seeing the example and provide an answer.
AI using the justification of the chain would outline every step and elaborated the calculations before arriving on the final Logola solution.

This approach is effective in areas requiring logical deduction, multi -stage problems and contextual understanding. While earlier models required human -provided thinking chains, advanced LLM, such as Openai’s O3 and Deepseek’s R1, can adapt and apply COT justification.

How LLM management implement simulated thinking

Different LLM use simulated thinking in different ways. Below is an overview of how Openai O3, Google Deepmind and Deepseek-R1 perform simulated thinking along with their relevant strengths and restrictions.

OPENAI O3: Think forward as a chess player

While the accurate details of the O3 Onlai remain unpublished, scientists believe that it uses a technique similar to the search for Monte Carlo (MCT), a strategy used in the games driven and such as Alphago. Like a chess player analyzing several movements before the decision, the O3 examines different solutions, evaluates their quality and selects the most promising.

Unlike earlier models that rely on patterns recognition, the O3 actively generates and improves justification routes using techniques. During the conclusion of IT IT approval of computing steps to create multiple chains of thinking. It is our evaluation of the evaluator model – probably a model of reward trained to ensure logical coherence and correctness. The final answer is selected on the basis of a scoring mechanism to a well -justified output.

O3 monitors the structured multi -stage process. Initially, it is fine -tuned on a huge data set of human thinking chains, internalizing logical thinking formulas. At the time of inference, it generates more solutions for a given problem, ranks them based on the correctness and coherence and improves the best if it is Sunday. While this method allows O3 to be repaired in front of Livor and improve the accident, compromise is computing records with multiple options, which means that the processing force is slower and more resources. However, O3 excels in dynamic analysis and problem solving and places it among the most advanced AI Today.

Google DeepMind: Rafination of responses as editor

Deepmind has developed a new approach called “Evolution of the Mind” that thinks like an iterative procedural reflection. Instead of analyzing more future scenarios, this model behaves more like an editor improves various essay designs. The model generates several possible answers, evaluates their quality and improves the best.

This high quality, inspired by genetic algorithms, ensures iteration. This is particularly effective for structured tasks, such as logical puzzles and programming challenges, where clear criteria determine the best answer.

However, this method has restrictions. Because it binds to an external scoring system to evaluate the quality of response, it can fight with an abstract justification without a clear or incorrect answer. Unlike O3, which dynamically reasons in real time, Deepmind’s model focuses on the improvement of the existence of responses, so it is less flexible for open questions.

Deepseek-R1: learn to think

Deepseek-R1 uses an attitude based on learning that allows him to develop capacity over time, rather than evaluating more real-time answers. Intead relying on pre-generated reasoning data, Deepseek-R1 learns solving problems, receiving feedback and improving iteratively similar to how students improve their skills in the field of problem solving through practice.

The model follows the structured teaching loop. It starts with a basic model such as Deepseek-V3, and is fast to solve the mathematical problem step by step. Each response is verified by a direct execution of the code, which bypass the need for the next model to verify correctness. If the solution is correct, the model is rewarded; If this is incorrect, it is penalized. This process is repeated extensively, allowing Deepseek-R1 to improve its logical skills and prefer more complex problems over time.

The key advantage of this approach is efficiency. Unlike O3, which explains extensive justification in inference time, Deepseek-R1 inserts the ability to think, which is fast and more cost-effective. It is highly scaled because it does not require a massive marked data file or an expressive verification model.

However, this approach based on reinforcement of learning has compromises. Because it lines to tasks with verifiable results, it excels in mathematics and coding. Nevertheless, he may face abstract justification in the law, ethics or creative solutions to problems. While mathematical reasoning can be converted to other domains, its wider usability remains a deviation.

Table: Comparison between Openai’s O3, Deepmind’s Mind Evolution and Deepseek’s R1

The future of thinking AI

The simulated justification is a significant step to make AI more reliable and intelligent. As these models evolve, the focus will move from simply generating the text to the development of robust problems solving problems that are very similar to human thinking. Future advances are likely to focus on the AI models to identify and correct errors, integrate them with external tools to verify the centers and recognize uncertainty when facing ambiguous information. However, the key challenge is the balance of the depth of reasoning with computational efficiency. The ultimate goal is to develop a system that thinks thoughtfully their reactions, beliefs and border, like a human expert who carefully evaluates any decision before the measure.

From Openia’s O3 to Deepseek’s R1: How Simulated Thinking will make LLM think deeper