DeepSeek’s AI Learns To Think For Itself - 1

Image by Matheus Bertelli, from Pexels

DeepSeek’s AI Learns To Think For Itself

  • Written by Kiara Fabbri Former Tech News Writer
  • Fact-Checked by Sarah Frazier Former Content Manager

Chinese startup DeepSeek, in partnership with Tsinghua University, says it has developed a smarter way to help artificial intelligence models think better and faster, without needing huge computers or expensive resources.

In a rush? Here are the quick facts:

  • DeepSeek created a self-improving AI using Self-Principled Critique Tuning (SPCT).
  • SPCT teaches AI to judge its own work using self-generated rules.
  • The method boosts performance without massive computing power.

The breakthrough comes from a new technique called Self-Principled Critique Tuning (SPCT). SPCT is different from simply making AI models larger to improve performance – SPCT does not require a lot of energy and computing power to teach the AI to judge its own work using a set of self-created rules.

The way that it works is via a built-in “judge” which verifies that the AI response both adheres to its internal reasoning rules, and appears suitable for human output. When the AI provides a solid response it receives positive feedback, which helps it improve its ability to answer similar questions in future instances.

DeepSeek implements this method as part of its DeepSeek-GRM system which stands for Generative Reward Modeling. GRM operates differently from traditional methods because it performs parallel checks to enhance both accuracy and consistency.

“We propose Self-Principled Critique Tuning (SPCT) to foster scalable reward generation behaviors,” the researchers wrote in their paper . “SPCT enables [the model] to adaptively posit principles and critiques based on the input query and responses, leading to better outcome rewards.”

With this system, DeepSeek claims its AI can now perform better than competitors like Google’s Gemini, Meta’s Llama, and OpenAI’s GPT-4o, especially when it comes to complex tasks like reasoning or decision-making, as noted by Euronews .

Importantly, DeepSeek says it plans to release these new tools as open-source software, though no release date has been shared.

Google’s Dreamer AI Learns How Play Minecraft Without Training - 2

Image by Oberon Copeland, from Unsplash

Google’s Dreamer AI Learns How Play Minecraft Without Training

  • Written by Kiara Fabbri Former Tech News Writer
  • Fact-Checked by Sarah Frazier Former Content Manager

A new AI system from Google DeepMind has figured out how to collect diamonds in Minecraft — one of the game’s toughest challenges — without any human instructions.

In a rush? Here are the quick facts:

  • Dreamer AI mastered Minecraft diamond quest without human guidance.
  • AI used imagination to predict actions’ outcomes.
  • Dreamer achieved expert level in nine days.

The AI, named Dreamer , taught itself to play Minecraft and reached expert level in just nine days. It did so by simply imagining the future outcomes of its own actions, as reported in a study published in Nature .

“Dreamer marks a significant step towards general AI systems,” said Danijar Hafner, a computer scientist at Google DeepMind, as reported by Tech Xplore . “It allows AI to understand its physical environment and also to self-improve over time, without a human having to tell it exactly what to do,” he added.

Minecraft is played by more than 100 million monthly users, to experience randomly generated 3D worlds. In order to find diamonds, users need to play multiple steps, starting with wood collection, followed by tool creation, then furnace construction, iron extraction, and finally underground excavation.

The process typically requires several hours of gameplay for most players. However, Dreamer used ‘ reinforcement learning ’ to discover new actions by retaining successful attempts and ignoring unsuccessful ones. The team provided small rewards for each step, such as crafting a plank and mining iron. They then reset the game every thirty minutes to prevent pattern memorization.

Differently from older AI systems who ‘watched’ human play in order to learn, Dreamer operated autonomously, and it did not require human demonstrations or step-by-step guidance. The system’s internal “world model” creation function allowed it to predict the results of actions before taking them.

“The world model really equips the AI system with the ability to imagine the future,” Hafner said, as reported by Tech Xplore. Jeff Clune, an AI expert from the University of British Columbia, called the achievement a “major step forward for the field,” reported Tech Xplore.

While humans can locate a diamond in approximately 20–30 minutes, Dreamer needed nine days to do the same. However, the researchers believe this work has far-reaching implications beyond video-games.

“This could help robots teach themselves how to achieve goals in the real world,” Hafner added, as reported on Tech Xplore.