Elie Schoppik · 2026-01-28

为何使用技能?(下)

摘要

本课程深入探讨了 Agent Skills 的开放标准本质及其工作机制。技能不仅包含文本文件,还可引用可执行脚本和资源文件,为 AI 智能体提供领域专业知识和可重复的工作流程。通过渐进式披露机制,技能能够在保护上下文窗口的同时,按需加载必要信息,从而在非确定性系统中实现可预测的输出。

要点

  • 开放标准:Agent Skills 是开放标准,可在 Codex、Gemini CLI、Claude Code 等多个平台上使用,支持跨环境移植和共享。
  • 组合能力:技能可包含 SKILL.md 文件、Markdown 文档、可执行脚本及资源文件,支持将自定义技能与内置技能组合使用。
  • 领域专业知识:技能为智能体提供特定领域的专业知识和可重复工作流程,解决 Claude 在特定公司或团队操作方式上的知识空白。
  • 渐进式披露:技能采用渐进式披露策略,仅将名称和描述添加到上下文窗口,按需加载详细内容和执行脚本,避免上下文窗口污染。
  • 新能力扩展:技能引入智能体开箱即用时不具备的能力,如生成演示文稿、处理 PDF、执行特定脚本等,极大扩展智能体功能。

视频信息:Why Use Skills? - Part II

视频信息:Why Use Skills? - Part II


中文翻译

在上一节课中,我们看到了如何在 Claude 中创建技能,以及如何从带有数据的提示词转变为打包的技能,从而让我们可以在许多不同的对话中使用它们。现在,让我们更深入地探讨什么是技能,以及支持它们的开放标准。与模型上下文协议(MCP)类似,技能本身也是一种开放标准,可用于许多不同的 AI 应用程序。虽然技能最初是由 Anthropic 创建的,但技能本身现在已经成为一种开放标准,具有特定的规范,并在许多不同的平台上使用,包括 Codex、Gemini CLI、Claude Code、Open Code 等等。

考虑到这一点,让我们来谈谈它是如何工作的。当我们构建 AI 应用程序时,为了使用特定的技能,在使用 Claude AI 或 Claude Desktop 等工具时,我们需要利用某种文件系统。在该文件系统中,我们要加载包含 SKILL.md 文件以及可被引用的子文件夹或文件的文件夹。在这里,我们可以看到与我们之前所做的完全相同。同时,技能本身不仅可以包含其他 Markdown 文档,还可以包含可执行的脚本。

例如,我们有一个处理 PDF 文档的技能。我们需要将 PDF 转换为图像,从表单字段中提取信息,甚至用注释填充 PDF 表单。这需要执行代码。但是,需要执行的代码可以从 SKILL.md 文件中引用。因此,当我们开始探索我们自己的自定义技能和内置技能时,重要的是要注意,技能不仅仅是引用其他文本文件的文本文件,而是可以引用脚本、说明它们做什么以及何时需要执行的文本文件。当我们开始思考创建自定义风格和品牌的方法时,技能还可以包含图标、图像和其他资源。

技能真正大放异彩的地方在于 Claude 可能不完全了解你或你的公司如何运作的领域。你可以想象设计时事通讯、创建品牌指南,这些事情 Claude 有一个大概的概念,但不了解你的公司或你的团队执行它的确切方式。为了让我们在构建自己的智能体时更清楚为什么要引入智能体技能(Agent Skills)。我们过去构建智能体的方式主要集中在单一用途的智能体上。编码、研究、金融、营销等等。这些特定领域的智能体拥有一套特定的工具,以及执行必要任务所需的上下文。

但是,当我们开始构建更多这种单一用途的智能体时,我们开始意识到,在底层,它们真正需要的只是一个简单的脚手架。像 bash 和文件系统这样的底层工具,用于查找、编辑、修改、执行和执行任何必要的任务。这些更简单的智能体更容易评估、理解和扩展。但是这些智能体缺乏的是可靠地完成工作所需的底层上下文和领域专业知识。这种上下文可以通过技能、通过模型上下文协议来提供,但领域专业知识确实也是技能大放异彩的地方。

我们希望金融智能体以特定的方式进行财务分析。我们希望研究智能体拥有必要领域专业知识,按照我们想要的方式进行研究。能够将其移植到许多不同的生态系统和智能体中,这就是我们拥有智能体技能的原因。这些技能为我们提供了过程性知识和特定于用户的上下文,它们可以按需加载。除了领域专业知识外,技能还可以提供可重复的工作流。在一个非确定性的系统中,我们并不总是确切地知道模型的输出会是什么,很难找到产生相同输出的可重复方法。技能允许我们要做的就是提供一个可重复的工作流。通过非常清晰的步骤或指令,允许智能体执行任务,我们可以开始更准确地预测结果。

技能还引入了新能力的概念,即智能体开箱即用时不知道如何做的事情,甚至是 Claude 不知道如何操作的数据。当我们引入这些新能力时,我们只需极少的额外上下文,就为我们的智能体释放了一个完整的生态系统和新功能。当我们思考领域专业知识时,我们希望依靠那些 Claude 可能不知道如何做,或者知道如何做但不针对你特定领域的事情。Claude 可以执行数据分析。Claude 可以执行法律审查。但它如何按照你、你的团队或公司希望的方式来做呢?我们之前看到了执行每周营销活动审查的能力,我们希望这在许多不同的个人和团队中是可预测的。

当我们开始思考其中一些新能力时,比如生成演示文稿、Excel 电子表格、PDF 报告,在必要时执行脚本来执行这些操作,这就是智能体技能可以大显身手的地方。此前在没有技能的情况下,我们要描述指令,试图预测工作流,并一次性将所有必要的文件捆绑在上下文中。我们谈到了技能的可移植性。虽然我们目前在 Claude AI 中看到了技能,但技能可以以完全相同的格式使用,不仅可以跨 Claude Code、Agent SDK 和 API 使用,而且由于 Agent Skills 是一个开放标准,你可以在越来越多的智能体产品中使用它。你可以在一个环境中创建技能,并在许多不同的环境中使用、共享和扩展它们。

当我们说技能是可组合的时,这是我们已经见过的。我们可以采用自定义技能(如分析我们的营销活动),并将其与内置技能(如创建 PowerPoint 演示文稿、PDF 或 Excel 电子表格)结合起来。我们不仅可以一起使用多个技能,还可以将它们结合起来构建复杂且可预测的工作流。我们可以引用必要的技能、必要的步骤,并开始在非确定性系统中创建可预测的输出。

在底层,技能可以包含相当多的信息。我们看到了包含额外 Markdown 文件的示例,甚至看到了包含可执行脚本的示例。你的系统中可能有数百种技能,而我们将更多地看到的是,为了保护上下文窗口,技能是逐步披露(Progressive Disclosure)的。逐步披露的想法是只加载必要的数据,避免污染上下文。我们喜欢把上下文窗口看作是一种公共资源。我们添加到上下文窗口的数据越多,我们消耗的 token 就越多,我们的上下文窗口填满得就越快,上下文退化或错误响应的可能性就可能增加。

为了避免用我们可能不需要的数据污染上下文窗口,技能引入了逐步披露的概念。当从文件系统加载技能时,唯一添加到上下文窗口的数据是技能的名称和描述。这至关重要,以便 Claude 或任何其他系统知道该技能是什么以及如何触发它。一旦该技能被触发,底层的 SKILL.md 就会被加载。这是将数据加载到上下文中的下一阶段。根据需要,如果需要加载和执行额外的文件或脚本,这些将逐步加载。这些额外的资源可以根据需要加载,如果需要加载脚本,这些脚本将在上下文窗口之外单独加载和执行,以避免用不必要的额外 token 污染上下文。

通过使用像 bash 和文件系统这样的工具,Claude 可以只加载必要的信息,只执行必要的脚本和文件读取,并有意地只将必要的内容添加到上下文窗口中。在下一节课中,我们将继续讨论技能,特别是它们如何与模型上下文协议、子智能体、底层工具等其他技术一起使用。

English Script

In the previous lesson, we saw how to create skills in Claude and move from prompts with data to package skills that we can use across many different conversations. Now let’s dive deeper and talk about what skills are and the open standard that powers them. Similar to the model context protocol, skills themselves are an open standard that can be used across many different AI applications. While skills were something originally created at Anthropic. Skills themselves are now an open standard with a specific specification that is used across many different platforms, including Codex, Gemini CLI, Claude Code, Open Code, and much more.

With that in mind, let’s talk a little bit about how this works. When we build AI applications, in order to use particular skills, we need to make use of some kind of file system when using tools like Claude AI or Claude Desktop. In that file system, we load folders that contain a SKILL.md file and subfolders or files that can be referenced. Here we can see exactly what we did previously. At the same time, skills themselves cannot only include other markdown documents, but scripts that can be executed.

For example, we have a skill for working with PDF documents. we need to convert PDFs to images, extract info from form fields, and even fill PDF forms with annotations. This requires code to be executed. But that code that needs to be executed can be referenced from the SKILL.md file. So as we start to explore our own custom skills and built-in skills, it’s important to note that skills are not just text files that reference other text files, but text files that can reference scripts, what they do, and when they need to be executed. Skills can also include icons, images, and other assets as we start to think about ways of creating custom styles and brands.

Where skills really shine are places where Claude might not know exactly how you or your company operates. You can imagine designing newsletters, creating brand guides, things that Claude has a general idea on, but not the exact way that your company or your team does it. To give some more idea of why we bring agent skills into the mix when we’re building our own agents. The way that we used to think about building agents centered around agents with a single purpose. Coding, research, finance, marketing, and much more. These domain-specific agents had a particular set of tools, the context that it needed to perform the task necessary.

But as we started to build more of these single-purpose agents, we started to realize that under the hood, all that they really need is a simple scaffolding. Underlying tools like bash and a filesystem, to find, edit, modify, execute, and perform whatever tasks are necessary. These simpler agents are easier to evaluate, understand, and scale. But what these agents lacked was the underlying context and domain expertise to do the job reliably. That context can be provided through skills, through the model context protocol, but that domain expertise is really where skills shine as well.

We want finance agents to perform financial analysis in a particular fashion. We want research agents to have the domain expertise necessary to research the way that we want. to be able to port that across many different ecosystems and agents, and that’s why we have agent skills. These skills provide us the procedural knowledge and the user-specific context that they can load on demand. In addition to domain expertise, skills can also provide a repeatable workflow. In a non-deterministic system, where we don’t always know exactly what the output of the model is going to be, it can be difficult to find repeatable ways of producing the same output. What skills allow us to do is provide a repeatable workflow. with very articulate steps or instructions that allow the agent to perform a task that we can start to predict with more accuracy.

Skills also introduce the idea of new capabilities, things that an agent does not know how to do out of the box or even data that Claude has no idea how to operate on. When we bring in these new capabilities, we unleash an entire ecosystem and new functionality for our agents with minimal additional context. As we think about domain expertise, we want to lean on things that Claude might not know how to do or knows how to do but not for your particular domain. Claude can perform data analysis. Claude can perform legal review. But how does it do it the way that you or your team or company want it to be done? We previously saw the ability to perform weekly marketing campaign reviews, and we want that to be predictable across many different individuals and teams.

As we start to think about some of these new capabilities, things like generating presentations, Excel spreadsheets, PDF reports, executing scripts when necessary to perform those actions, that here is where agent skills can shine. What we saw previously, without skills, was the idea of describing our instructions, trying to predict workflows and bundling all of the necessary files in context at one time. We talked a little bit about the portability of skills. And while we’ve seen skills so far in Claude AI, skills can be used in the exact same format, not only across Claude Code, the Agent SDK and the API, but since Agent Skills are an open standard, you can use this across a growing number of agent products. You can create skills in one environment and use them and share them and scale them across many different environments.

When we say that skills are composable, this is something that we’ve seen already. We can take custom skills like analyzing our marketing campaign and we can combine that with built-in skills like creating PowerPoint presentations, PDFs, or Excel spreadsheets. Not only can we use multiple skills together, but we can combine them to build complex and predictable workflows. We can reference the skills necessary, the steps necessary, and start to create predictable outputs in a non-deterministic system.

Under the hood, skills can contain quite a bit of information. We saw examples with additional markdown files and even examples with scripts that can be executed. You can have hundreds of skills across your system, and what we’re going to see quite a bit more is that to protect the context window, skills are progressively disclosed. The idea of Progressive Disclosure is to only load the data necessary and avoid polluting the context. We like to think of the context window as a public good. The more data that we add to the context window, the more tokens we consume, the faster our context window fills up, and the likelihood of context degradation or incorrect responses potentially increases.

In order to avoid polluting the context window with data that we might not need, skills introduce the idea of Progressive Disclosure. When skills are loaded from the file system, the only data that gets added to the context window is the name and description of the skill. This is essential so that Claude or any other system knows what the skill is and how to trigger it. Once that skill is triggered, the underlying SKILL.md is loaded. This is the next phase of loading data into context. And depending on what is required, if there are additional files or scripts that need to be loaded and executed, those will be loaded progressively. These additional resources can be loaded as needed, and if there are scripts that need to be loaded, those scripts are loaded and executed separately from the context window to avoid polluting with additional tokens that are not necessary.

By using tools like bash and a file system, Claude can load only the information that’s necessary, execute only scripts and reading of files that is necessary, and intentionally only add what is necessary to the context window. In the next lesson, we’ll continue talking about skills and particularly how they’re used alongside other technologies like the model context protocol, sub-agents, underlying tools, and much more.