03-Some Personal Thoughts on AI Agent

If we talk about the most discussed product in the AI circle these days, it must be Manus, released by Monica on March 5th. Some self-media even call it another Chinese AI product that has shocked the world after DeepSeek, and even label it as a "national-level product." It is said that an invitation code for Manus' beta testing is now being sold for up to 50,000 RMB.

As a new type of general-purpose AI Agent, is Manus' popularity due to its technological innovation? Or is it a result of scenario innovation? Or is it simply media hype or excessive marketing by Monica? And what can we learn from the Manus phenomenon? This article will discuss these questions.

First, let me clarify that I have never used Manus, nor do I have an invitation code for registration. My current understanding of Manus comes mainly from watching official videos and documents, some "Manus session replays" online, and my own knowledge of AI Agent technology.

1. What is an AI Agent?

1.1 Simple Definition

An AI Agent (Artificial Intelligence Agent) is like a super-intelligent little robot assistant. It can perceive its environment (as if it has eyes and ears), make decisions based on the information it perceives, and take action.

For example, you can think of it as a smart butler. This butler can monitor various conditions at home, such as temperature and humidity, and even know whether you have returned home. Based on these conditions, it will make decisions, like turning on the air conditioner if the temperature is too high or turning on the lights when you come home.

1.2 Components

Perception Module: This is its "sensory organ." It can receive various types of data, such as text, images, and sounds. For example, for a voice assistant-type AI Agent, the perception module is the microphone, which captures your speech and converts it into text that the Agent can understand.
Decision Module: This is its "brain." After perceiving information, it thinks here about what to do next. Taking the voice assistant example again, after receiving the text "play music," the decision module will figure out how to locate the music player software and instruct it to start playing music.
Execution Module: This is its "hands and feet." It carries out the actual tasks based on the instructions from the decision module. In the music playback example, the execution module interacts with the music player software (calling tools) to make it play music.

2. Levels of AI Agents

I asked DeepSeek to design three levels for AI Agents, similar to the classification of autonomous driving. DeepSeek's results are as follows:

2.1 Task-Guided Type

This is the simplest level. It mainly uses pre-programmed workflows to complete specific tasks. For example, an AI Agent that generates papers: you just input a topic, and it can generate a 10,000-word professional paper for you.

Currently, most intelligent agents built with Coze and Dify fall into this category. Although you can build workflow-based AI Agents that handle complex problems, they can only solve one or a specific type of problem because all execution workflows are pre-set by you.

2.2 Goal-Oriented Type

Agents at this level have clear goals and possess autonomous planning and execution capabilities. They can set their own goals and achieve them step by step. OpenAI DeepResearch and the protagonist of this article, Manus, belong to this category.

For example, if a user asks an Agent to produce a research report on China's silver economy, the Agent will first plan a task list and then execute and confirm each step, ultimately delivering a complete report. Below is a task list generated by Manus.

Task List Generated by Manus

2.3 Self-Learning Type

This is the highest level of Agent. It can not only make decisions and take actions based on goals but also continuously learn and improve itself. It learns from past experiences and constantly adjusts its strategies and behaviors.

For example, an advanced image recognition AI Agent might initially only recognize common objects like cats, dogs, and cars. However, as it processes more images and learns by comparing them with correct labels (which tell it what the images actually contain), it can improve its accuracy and even recognize objects or features it has never seen before.

3. Manus' Core Selling Points

The industry has hyped Manus to an almost mythical level. Coze loyalists might not be happy, saying, "Such a simple thing can be done in minutes with Coze or Dify workflows." Indeed, theoretically, anything Manus can do can also be done with Coze. However, the core difference between Manus and Coze is: Coze workflows require human setup—essentially, humans instruct the AI step by step to achieve goals. Thus, each Coze workflow can only solve specific tasks. For example, you can't ask a Coze AI Agent designed for writing research reports to generate a podcast, nor can you ask a podcast-generating AI Agent to create a video script. But Manus can do all these things because Manus has autonomous planning and execution capabilities, meaning it can design workflows on its own.

Here's a simpler analogy to help you understand: The difference between Manus and Coze is like the difference between DeepSeek-R1 and GPT-4o. Can you achieve DeepSeek-R1's results with GPT-4o? Of course, as long as you add professional chain-of-thought prompts every time you input instructions to GPT-4o. However, not only is this cumbersome, but the bigger issue is that ordinary people can't easily write such professional prompts. For more on how DeepSeek-R1 achieves reasoning capabilities, check out my previous article, DeepSeek-R1 Core Technology Explained, linked at the end of this article.

From a user experience perspective, Manus does something very similar to DeepSeek-R1: it eliminates the user's thinking process. We just need to tell Manus what we want, and it figures out how to achieve the goal on its own, leaving us to wait for the results. If Coze Agents are just tools, then Manus is more like an actual assistant—it thinks and makes decisions on its own, without needing constant micromanagement, making it feel more human.

One last question: OpenAI's DeepResearch also has autonomous planning and execution capabilities, possibly even stronger than Manus (boasting a "Ph.D.-level" Agent). So, what fundamentally distinguishes Manus from OpenAI's DeepResearch (and similar products like Gork3, Mitas Search, and Perplexity's DeepResearch feature)?

I personally think Manus has two advantages over OpenAI's DeepResearch:

Transparency: Manus' execution process is fully transparent—users can see the entire decision-making and execution process. OpenAI's DeepResearch, on the other hand, completes its execution in the background.
Dynamic Error Correction: Manus allows users to interact with it during task execution. For example, if you ask a resume-screening Agent to process 100 resumes, you can temporarily add 10 more resumes after it has processed 50 and say, "Process these 10 as well," without restarting the task. Or you can modify the screening criteria on the fly, and the Agent will continue executing based on the new criteria. Doesn't this feel like having a super-efficient subordinate who can handle anything you throw at them?

4. Is Manus Innovative?

This is a debatable question. If you're an AI application developer, you can probably guess the technical implementation behind Manus: it essentially uses a large model as a decision center (the Agent's decision module) for task orchestration and leverages the model's function-calling capability (Function_call) to invoke external tools (the Agent's execution module) to complete tasks. Many developers believe that Manus is essentially "Lego-style assembly" rather than a breakthrough in underlying technology.

If you're a regular in the AI open-source community, you might also find Manus familiar.

That's right—Manus' product design is very similar to the open-source project AutoGPT, which went viral on GitHub in 2023. Both projects aim to "give a large model hands and feet."

AutoGPT garnered 172K stars on GitHub, making it the top-ranked new open-source project of 2023.

AutoGPT Earned 172K Stars on GitHub

AutoGPT is incredibly powerful, with many built-in plugins. I even fixed a bug in their browser plugin back in 2023. However, AutoGPT must be used via the command line, which isn't very user-friendly. According to the official documentation (README), it now has a GUI, but I haven't tested it yet.

Another similar open-source project, AgentGPT, also gained 33K stars. While not as powerful as AutoGPT, it is easier to deploy and comes with a GUI:

AgentGPT

You just need to input your task name, define the goal, and start the task. The Agent will automatically plan tasks and execute them one by one.

AgentGPT Executing a Task

You might wonder why these two projects didn't become as popular as Manus. Outside the developer community, many people probably haven't even heard of them. The reasons are poor product experience—not only is the operation inconvenient, but tasks often fail inexplicably—and unsatisfactory results.

I think there are two main reasons: one is their decision-making and task-scheduling algorithms, and the other is that the capabilities of large models at the time were not strong enough, leading to low-quality task orchestration.

Manus' developers have fully leveraged the advantages of existing large AI models, virtual machines, and cloud services to address these issues. However, Manus currently seems to lack built-in tools. From their promotional videos and user-shared demos, it mainly uses a browser, text editor, and code executor. AI drawing and video tools are missing, so Manus can only generate statistical charts like bar graphs (which can be created via code, mainly SVG and simple PNGs generated by Python). You can't ask Manus to generate a clothing design or a house renovation plan.

Two more fun facts:

Manus is not built on DeepSeek-R1, simply because AI Agents rely on function-calling capabilities, which DeepSeek-R1 currently doesn't support.
Just yesterday (March 7th), a group of programmers replicated Manus' core functionality in just three hours and open-sourced it. The project gained 12K stars on GitHub in less than 24 hours.

OpenManus Earned 12K Stars in 24 Hours

I pulled the code and ran it immediately. Honestly, it doesn't feel much better than AutoGPT. While the core functionality is there, it's far from being product-ready. Still, they caught the wave of hype, and I hope the community can further improve the project. The source code link is at the end of this article for those interested.

5. Building Infrastructure or Doing Renovation?

Every technological revolution brings two types of people: those who build infrastructure and those who do renovation.

Infrastructure builders invest heavily in foundational work, and usually, only large companies can participate. Renovators, on the other hand, require much less investment—a small team can start a project.

In the internet era, infrastructure builders laid optical cables, while renovators developed internet products.
In the mobile internet era, infrastructure builders developed phones and operating systems, while renovators created apps.
In the cloud computing era, infrastructure builders set up data centers to provide cloud services, while renovators developed cloud-native applications.
Now, in the AI era, if infrastructure builders are those providing large model computing power, then renovators are naturally us—AI application developers.

If the popularity of ChatGPT and DeepSeek showed us the huge advantages and economic benefits of leading in underlying technology, then Manus' breakout success reminds us that once technology matures, the focus often shifts from breakthrough innovations to combining mature technologies, optimizing engineering, and achieving productization and commercialization—what we often call "model innovation."

History has repeatedly shown that both infrastructure building and renovation are promising. For example, China Telecom and Alibaba, or Xiaomi and Tencent—it's hard to say which is more profitable. As long as you find the right niche, you can succeed. However, I believe 99% of people are better suited for renovation, as infrastructure building requires massive investments and often leads to a winner-takes-all scenario, while renovation allows for diverse and flourishing outcomes.

6. Summary

Manus' popularity is the result of technological breakthroughs, precise positioning, and effective communication strategies. The market's urgent demand for an AI assistant that "can actually get things done" has amplified the impact of this innovation.

Manus' core selling points are:

General-Purpose: It has end-to-end autonomous planning and execution capabilities.
Commercial-Ready: It offers a great user experience and is deliverable.
Dynamic Error Correction: It allows users to adjust requirements on the fly during task execution.

Manus' success also reminds us that once technology matures, the focus often shifts from breakthrough innovations to combining mature technologies, optimizing engineering, and achieving productization and commercialization. This is an opportunity for all AI application developers—as long as you have good ideas and can truly solve user needs, don't worry about technical limitations, because every breakthrough in large model technology will become an enabler for your work.

7. Reference Links

AgentGPT Demo: https://agentgpt.reworkd.ai/zh
AutoGPT: https://github.com/Significant-Gravitas/AutoGPT
OpenManus Source Code: https://github.com/mannaandpoem/OpenManus
DeepSeek-R1 Core Technology Explained: https://mp.weixin.qq.com/s/vKl3gzfthMZGIl-T01OvHA