04-Understanding MCP and A2A Protocols

1. What is MCP?

MCP, short for Model Context Protocol, is a standardized protocol used to connect AI models with various data sources and tools. You can think of MCP as the "USB-C interface" of the AI world—just as USB-C allows devices like phones, USB drives, and printers to easily connect to computers and exchange data, MCP provides a unified "plug" for collaboration between AI models and external tools.

2. MCP VS Function Call

When it comes to AI calling tools, many people first think of the "Function Calling and Other API Updates" feature released by OpenAI on July 20, 2023. OpenAI allows large models to call external tools by passing the functions parameter in API requests. For example:

curl https://api.openai.com/v1/chat/completions -u :$OPENAI_API_KEY -H 'Content-Type: application/json' -d '{
  "model": "gpt-3.5-turbo-0613",
  "messages": [
    {"role": "user", "content": "What is the weather like in Boston?"}
  ],
  "functions": [
    {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
  ]
}'

At the time, GeekAI also integrated function calling to implement features like Weibo hot searches and Dalle painting.

In fact, some MCP clients directly use the model's built-in Function Call feature to implement the MCP protocol. When I used CherryStudio, I captured the prompts sent to the large model and found that the final prompt looked like this:

{
  "model": "deepseek/deepseek-v3-base:free",
  "messages": [
    {
      "role": "user",
      "content": "Install a filesystem MCP server for me"
    },
    {
      "role": "user",
      "content": "hello"
    }
  ],
  "stream": true,
  "tools": [
    {
      "type": "function",
      "name": "mcp_auto_install_getAvailableServers",
      "function": {
        "name": "fDjZNXZmUB8O62shf4juZD",
        "description": "List all available MCP servers that can be installed. Returns a list of server names and their basic information. Use this to discover what MCP servers are available before installing or configuring them.",
        "parameters": {
          "type": "object",
          "properties": {
            "random_string": {
              "type": "string"
            }
          }
        }
      }
    },
    ......
  ]
}

We can see that CheeryStudio bundles all MCP tools into the tools parameter and passes them to the large model. This requires the large model to support tool calling functionality. In CherryStudio, only large models marked with "tools" have MCP tool services available.

Function Call greatly enhances the capabilities of large models, but it also has two issues:

Not all large models support Function Call.
The accuracy of triggering Function Call varies significantly among different large models—the same question may sometimes trigger a tool call and sometimes not. GPT-4o and Claude are the most accurate, followed by Qwen.

To address these issues, Anthropic proposed the MCP protocol, which was officially released on November 25, 2024. The MCP protocol is similar in principle to function calling—both provide tool definition information to the large model, allowing the model to decide whether to call a tool and which tool to call. The difference is that MCP writes the JSON Schema of the tools directly into the prompt (Prompt) rather than passing it as an API parameter. For example, Cline's MCP request includes all MCP tool definitions and usage in the prompt:

{
  "model": "deepseek/deepseek-v3-base:free",
  "messages": [
    {
      "role": "system",
      "content": "You are Cline, a highly skilled software engineer with extensive knowledge in many programming languages, frameworks, design patterns, and best practices.\n\n\n====\n\nMCP SERVERS\n\nThe Model Context Protocol (MCP) enables communication between the system and locally running MCP servers that provide additional tools and resources to extend your capabilities.\n\n# Connected MCP Servers\n\nWhen a server is connected, you can use the server's tools via the `use_mcp_tool` tool, and access the server's resources via the `access_mcp_resource` tool.\n\n## baidu-maps (`undefined`)\n\n### Available Tools\n- map_geocode: \n    Name:\n        Geocoding Service\n        \n    Description:\n        Resolves an address into corresponding coordinates. The more complete the address structure and the more accurate the address content, the higher the precision of the resolved coordinates.\n        \n    Args:\n        address: The address to be resolved. Supports up to 84 bytes. Two styles of values are supported:\n        1. Standard structured address information, such as "No. 10, Shangdi 10th Street, Haidian District, Beijing" [Recommended; the more complete the address structure, the higher the resolution accuracy]\n        2. Supports descriptions like "* Road and * Road intersection," such as "the intersection of North First Ring Road and Fuyang Road"\n        The second method does not always return results; it only works if the address description exists in the address database.\n        \n    \n    Input Schema:\n    {\n      \"type\": \"object\",\n      \"properties\": {\n        \"address\": {\n          \"title\": \"Address\",\n          \"type\": \"string\"\n        },\n        \"ak\": {\n          \"title\": \"Ak\",\n          \"type\": \"string\"\n        }\n      },\n      \"required\": [\n        \"address\",\n        \"ak\"\n      ],\n      \"title\": \"map_geocodeArguments\"\n    }\n"
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "<task>\nHow long does it take to ride from Dongguan South City Bus Station to Songshan Lake?\n</task>"
        },
        {
          "type": "text",
          "text": "<environment_details>\n# VSCode Visible Files\n(No visible files)\n\n# VSCode Open Tabs\n(No open tabs)\n\n# Current Time\n2025/4/25 6:52:29 AM (Asia/Shanghai, UTC+8:00)\n\n# Current Working Directory (/Users/yangjian/Desktop) Files\n(Desktop files not shown automatically. Use list_files to explore if needed.)\n# Context Window Usage\n0 / 64K tokens used (0%)\n\n# Current Mode\nACT MODE\n</environment_details>"
        }
      ]
    }
  ],
  "temperature": 0,
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}

I've simplified this for readability; the actual full request is very long and is included in the reference links at the end.

Here, we can see that Cline directly tells the AI in the prompt which MCP tools are available, what each tool does, and how to call them, allowing the AI to learn and use them on the fly. The AI calls the specified MCP tool as needed, receives the result, and then proceeds to the next step, possibly calling another MCP tool until the task is completed.

The image below shows the entire process of me using Cline to call MCP, which might give you a deeper understanding:

In summary, Function Call is a built-in tool-calling feature during model training, while MCP is essentially a prompt engineering technique that adds tool-calling rules to the prompt, allowing the large model to learn and use them on the fly. MCP leverages the AI's "few-shot learning" capability, which almost all AI large models possess, making it a universal protocol. Since this protocol is written into the model's contextual prompts, it is also called the "Model Context Protocol."

It's worth noting that this prompt-engineering-based MCP implementation consumes a lot of tokens. A single request can easily exceed tens of thousands of input tokens. The image below shows my Cloudflare AI gateway logs, where almost every request exceeds 15,000 input tokens.

3. What is the A2A Protocol?

The A2A protocol is the protocol between AI agents. First, we need to understand what an agent is. An AI agent consists of three parts:

Perception Module
Decision Module
Execution Module

The perception module is essentially the input module of the agent, which can be keyboard input, voice input, or various other sensors.

The decision module is the large language model itself.

The execution module is essentially a pool of tools composed of numerous MCPs, serving as the "hands and feet" of the agent.

If we think of large models as information engines, then AI agents are essentially task engines that can automatically orchestrate and execute tasks.

The A2A protocol is the interaction protocol between agents. Agents use this protocol to "help each other" complete tasks. If we think of agents as small teams, the A2A protocol can be seen as the "guidelines for cross-department collaboration." Imagine a scenario: Each department in your company has its own software system. For example, the sales department uses a CRM, the finance department uses an ERP, and the customer service department uses an intelligent ticketing system. In the past, these systems were independent, and data was not shared. You often had to copy and paste between different systems, which was particularly cumbersome.

Now, the company has introduced a new solution where each system has a powerful AI assistant, and they have established a unified communication protocol. When the sales department's AI receives a large order, it directly notifies the finance AI through this protocol to handle accounting and invoicing. The finance AI then notifies the warehouse system's AI to arrange shipping and track logistics, and finally reminds the customer service AI to follow up.

All this communication and collaboration happens proactively and privately between AIs, and we humans only need to nod in confirmation for the entire process to proceed smoothly. In this way, AI assistants from different departments can perform their duties while collaborating with each other. This is the core of A2A technology—enabling AI agents from different platforms and roles to communicate and collaborate, breaking down information silos and completing complex cross-system tasks.

NVIDIA CEO Jensen Huang said at this year's CES:

As AI technology advances, future IT departments will gradually evolve into HR departments for AI agents, responsible for managing, training, and improving these digital employees to better serve the company.

It is foreseeable that the A2A protocol will become the standard for collaboration between digital employees.

1. What is MCP?

2. MCP VS Function Call

3. What is the A2A Protocol?

4. Reference Links