Elie Schoppik · 2026-01-28

使用 Agent SDK 的技能

摘要

本课程讲解如何使用 Claude Agent SDK 构建一个研究智能体应用,该应用能够利用技能系统协调多个子智能体,从多个信息源(文档、代码仓库、网络搜索)收集数据并生成结构化的学习指南。课程展示了如何集成 MCP 服务器连接外部服务(如 Notion),以及如何通过技能定义可预测的工作流程和输出格式,最终实现对开源工具的自动化研究和文档生成。

要点

  • Agent SDK 架构:主智能体作为协调者,调度文档研究员、仓库分析器和网络研究员三个子智能体并行工作
  • 技能系统应用:通过 learning-a-tool 技能定义研究方法论、渐进式学习路径和标准化输出格式
  • MCP 服务器集成:使用 Notion MCP 服务器将生成的文档自动写入团队知识库,实现工作流自动化
  • 工具权限管理:需要在 allowed_tools 中显式声明 Write、Bash、WebSearch、WebFetch 等工具,确保主智能体和子智能体都能正确使用
  • 渐进式学习路径:从概述和动机开始,经过安装、核心概念、实践模式,逐步深入到高级用法和生产级示例

视频信息:Skills with the Agent SDK


中文翻译

在最后一节课中,我们将使用 Claude Agent SDK 创建一个研究智能体。该智能体将利用一项技能,基于其文档、GitHub 仓库和网络搜索,为一个开源工具创建学习指南。我们开始吧。既然我们已经看到了如何使用 Messages API 和 Claude Code 在网络上与 Claude 一起使用技能,让我们谈谈如何使用 Claude Agent SDK 使用技能。作为复习,Claude Agent SDK 是一种构建你自己的智能体应用程序的编程方式,它使用了 Claude Code 所使用的相同的内部工具包。

我们将在这里构建的是一个通用研究智能体。主智能体将能够从多个来源研究信息并综合成摘要。它将调度三个不同的子智能体,分别用于分析文档、分析和下载仓库,以及通过搜索网络研究信息。让我们来看看这些提示词,然后我们将看看用于指导主智能体进行研究方法以及需要提取和综合什么的技能。

首先,我们有我们的主智能体提示词。这是拥有三个可用子智能体访问权限的协调者,这三个子智能体具有以下能力:从文档中查找信息、分析仓库结构、查找文章、视频和社区内容并将它们整合在一起。在这个特定的应用程序中,我们提到如果提供了技能,我们要让它遵循特定的模式。对于我们正在构建的应用程序,可能会提供技能,也可能不提供,但在我们的例子中,我们将提供一个。如果技能与用户的请求匹配,我们需要精确地遵循该技能的说明。

由于我们是从头开始构建这个智能体应用程序,我们要非常有意识地处理当提供技能或未提供技能时该做什么。当我们继续时,我们有几个高级委派指南,关于如何生成子智能体以及在收到结果后如何综合所有这些信息片段。让我们简要深入了解一下我们子智能体的一些提示词。对于文档研究员,我们将拥有 WebSearch 和 WebFetch 的访问权限。我们提供了一个定位文档的过程。特定的输入格式、指南以及以某种方式返回发现的输出。

对于仓库分析器,我们也提供 WebSearch 来查找仓库,Bash 命令来克隆和运行 git,然后还有读取和查找文件及文件中数据的能力。同样,我们提供了一个过程、输入格式、指南和一个输出。最后,我们的网络研究员也利用了 WebSearch 和 WebFetch。这允许我们搜索与该主题相关的内容,并从主智能体那里接收提取说明。我们也提供了指南以及必要的输出格式,如果没有指定输出格式,则遵循默认结构。当我们设置使我们的智能体工作所需的代码时,所有这些提示词将一起使用。

最后,让我们谈谈我们将在这里使用的技能。我们有一个名为 learning-a-tool 的技能。这个技能的目的是指导主协调者。我们不会在我们单独的子智能体中使用该技能,但我们将使用此技能作为创建可预测模式的一种方式,以便主智能体知道理想的工作流以及什么以及如何调度子智能体。我们给技能一个名称和一个描述。在这种情况下,我们要为编程工具创建学习路径,定义应该研究什么信息,并指定如何最好地遵循一种研究方法,一直到创建一个全面的学习路径。

首先,我们在这里有一个非常特定的工作流。我们从研究阶段开始,我们要为该子智能体指定官方文档的确切查找内容。对于仓库分析器,也是类似的方法。对于我们的网络研究员,也非常相似。所以我们使用这个技能来提供一个恒定且可预测的工作流,关于如何最好地与主智能体拥有的子智能体一起工作。一旦把该数据交给我们,我们就将该内容组织成渐进的级别。在这里,我们要利用逐步披露(Progressive Disclosure)来加载另一个 Markdown 文件作为事实来源。在我们的渐进学习文件中,我们可以看到有很多关于我们想要的各个级别的内容。从概述和动机开始,一直到安装核心概念、实践模式,然后接下来去哪里。

这种渐进学习允许我们构建级别,以便我们知道如何从头开始,并最终知道在何处更深入。虽然这个初始技能对于学习工具很有用,但你也可以想象,根据我们正在处理的数据,我们可能会有额外的技能用于比较一个工具与另一个工具。当我们朝着使用此技能的其他阶段迈进时,我们获取该数据并指定一个结构,然后指定一个输出。我们对我们要处理的确切格式非常非常挑剔。这里的目标是获得一个学习环境的访问权限,该环境为我们提供概述、资源、路径和代码示例。这里的目标是将我们所有子智能体的研究结合成我们想要的特定输出格式,并以一致性和可预测性来做到这一点。

既然我们已经从高层次上看到了我们要构建的应用程序,我们将再增加最后一部分。我们可以想象我们要获取输出并将其写入一个中心位置,我们可以与队友分享,该位置可能有一个更好的界面。为了做到这一点,我们将使用 Notion。为了连接到 Notion,我们将使用一个 MCP 服务器并引入必要的工具来继续执行此操作。现在我们已经检查了我们的主智能体和子智能体的底层提示词,以及我们将要使用的技能,让我们继续并通过运行 uv init 开始并初始化一个项目,并添加必要的依赖项,如 claude-agent-sdk、python-dotenv 和 asyncio。

一旦我们安装了这些依赖项,让我们继续创建一个名为 agent.py 的文件。所以我将继续,制作一个新文件,称之为 agent.py。在我们的 agent.py 中,我要添加必要的代码,以便仅使用 Claude Agent SDK 开始一个小例子。这里的样板代码引入了 asyncio 来运行此环境,引入 dotenv 来加载环境变量,然后从我们的 utils 中引入 display_message 函数。为了提供一些背景信息,display_message 为我们提供了一堆用于截断和格式化输入的辅助工具,并且它为我们提供了一种很好的方式来直观地显示来自主智能体和子智能体的信息。这是与我们在使用 API 时看到的非常相似的代码,我们在每次工具操作和迭代中都得到了很好的输出。

首先,我们设置我们的 Claude 智能体。我们在这里传入一个 system_prompt。这将改变。我们传入 allowed_tools。这也将改变,但我们只是想从这里的基础知识开始。为了开始一个简单的对话,我们设置一个循环。接受一些用户输入,通过我们的模型运行它,取回响应并将其发送回用户。让我们去看看那是什么样子的。我将再次打开终端。我们要继续运行 uv run agent.py。这将为我们提供一个终端环境,我们可以在其中开始对话。我就从问"你好吗?“开始。在这种情况下,我不会在这里得到大量有价值的信息,因为我只是有一个乐于助人的助手。所以我们现在要开始增加的是让我们的智能体访问 MCP 服务器以及正确工具的能力。

让我们继续对我们的 main 函数进行一些修改。就像我们提到的那样,allowed_tools 将会改变。所以我们要开始做的是添加我们的子智能体需要使用的工具,以便它们可以按预期工作。只读工具如 read、Grep 和 Glob 默认允许。但是当我们想开始做像写入文件、搜索网络和通过 bash 执行命令这样的事情时,我们需要明确传入它们。所以我们将引入 Write 工具、Bash 工具,以及我们的 WebSearch 和 WebFetch 工具。我们之前看到我们的子智能体将利用这些特定工具。我们要分析仓库的智能体需要 Bash 来运行 git 命令和写入文件,我们的文档研究员和网络研究员将利用搜索和获取。

现在我们已经引入了这些工具,我们要添加的下一件事是要连接的 mcp_servers。我们将使用 mcp_servers 关键字参数并指定 MCP 服务器的名称,在我们的例子中是 notion。我们要传入一些默认配置。我们要指定运行 notion 服务器的命令以及我们要有的 notion 环境变量。所以我们要确保在我们继续之前,从我们的 .env 文件中加载我们的 notion 令牌并导入 OS 模块以确保我们可以正确读取该文件。现在我们已经正确加载了我们的 MCP 服务器,我们需要利用 Notion 提供的工具。如果我们愿意,我们现在可以问 Claude,你从这个 MCP 服务器获得了什么工具?或者,我们可以继续明确添加那些。

通过使用 mcp,我们的服务器名称,后面跟着工具名称。在这种情况下,我们将使用 Notion 为我们提供的所有工具。我们需要确保此 mcp_notion_* 存在于 allowed_tools 中,以便我们可以给予主智能体使用这组工具的权限。我们可以明确添加工具名称,或者在我们的例子中,我们要包含 mcp_notion 为我们提供的所有可用工具。现在我们已经设置了我们的 mcp_servers 和我们的 allowed_tools,让我们继续引入我们的子智能体及其定义。我们提到我们的 system_prompt 将会改变。首先,我们要继续加载我们拥有的所有提示词。我们将引入一个常量和一个我们必须加载所有这些提示词的辅助函数。我们要继续调用该函数以继续引入这些提示词。在我们的 main 函数内部。

我们将利用这些 Markdown 文件来加载必要的文本并将其传递给我们的智能体选项。在我们继续更新主智能体之前,我们要添加一个字典,引用我们所有的带有定义的智能体。我们要引入 AgentDefinition 类,我们要确保正确导入它。我们可以在 AgentDefinition 中看到,我们有一个针对子智能体的描述、一个指定智能体指令的提示词,以及我们想要该智能体使用的工具。类似于我们在 Claude Code 中所做的配置。你可以看到这里,我们仍然需要使用我们的 main_agent_prompt 以及这个智能体字典。所以我们将使用 main_agent_prompt 更新我们的 system_prompt。然后我们将确保传入一个额外的关键字参数 agents,它引用带有我们智能体定义的字典。

正如你在这里看到的,我们的研究员、我们的分析器和我们的网络研究员正在使用我们也在这里定义的工具。重要的是要确保你在 allowed_tools 中列出你的主智能体和你的子智能体将需要使用的所有工具,否则即使你在这里包含了工具,你的子智能体也不会允许它们。现在我们已经设置了我们的智能体,我们需要确保我们还包括最重要的 Task 工具,以确保我们可以调度子智能体并将任务分配给它们。我们需要在这里添加的最后一部分是技能。好消息是,为了添加技能,我们只需要再添加一个工具。那就是 Skill 工具。

由于我们在这里有一个环境,其中有文件系统和使用 Bash 工具执行代码的能力,我们只需要添加这个 Skill 工具,以便我们可以正确读取技能并了解如何最好地使用它们。类似于 Claude Code,技能定义在 .claude 文件夹后面跟着一个名为 skills 的文件夹中。确保你的 Markdown 文件是 SKILL.md,你的文件夹名为 skills(复数)。现在我们已经添加了用于处理技能的工具,这里还需要传入一个关键字参数。我们需要指定我们在哪里找到这组特定的技能。我们使用名为 setting_sources 的关键字参数这样做。在这里我们要指定我们要查找用户目录中的技能(如果我们在主目录中有技能),以及 project(项目),这是我们为这个特定应用程序加载技能的地方。

现在我们把这一切放在一起,让我们继续测试我们的智能体。我们再次打开终端。我将继续退出,让我们用我们所做的更改再次运行此应用程序。我们将从了解一点关于 MinerU 的信息开始。对于不熟悉的人来说,MinerU 是一个用于 PDF 提取的开源库。我们使用这个例子的原因是,这不是 Claude 可能会从其初始训练数据中知道很多的东西。这将需要外部研究、分析代码库、社区文档和其他来源。我们将要求创建一个学习指南,然后先给我展示计划。

在这里,我们将开始看到技能被调用,这里的输入是那个名为 learning-a-tool 的技能以及我们指定的参数。所以在这里,我们可以看到我们首先指定了计划。我们仍然必须继续运行子智能体要做的事情,但就像 Claude Code 和计划模式一样,我们可能想在开始行动并消耗 token 和花费时间之前看看计划是什么。我们可以看到不同研究人员并行调查的研究阶段。我们可以看到根据技能所需的结构,以及最后我们期望的输出。这看起来是个好计划。所以我们将继续要求它继续进行。

它将从生成 docs_researcher 子智能体、生成 repo_analyzer 和 web_researcher 开始,并并行执行这些,使用我们在 allowed_tools 下添加的工具,我们也将其传入了我们的子智能体。我们可以并行看到,docs_researcher 正在前往文档,repo_analyzer 正在 GitHub 上查看,web_researcher 正在教程和 YouTube 指南中搜索。我们正在使用 bash 命令从 GitHub 仓库中提取信息,同时在 YouTube 频道上搜索视频演示。这些智能体正在并行交互,从不同的数据源获取数据,以将其整合到一个引人注目的教程中。

现在子智能体已经完成了它们的工作,我们将创建全面的指南,根据我们在这里的研究将所有必要的文件整合在一起。按照仓库分析器中的指示,我们已经克隆了 MinerU 的仓库,并将其保留在这里。我们开始构建用于学习此内容的文件夹结构。我们可以看到这里,我们要有我们的 readme 和资源,以及正在整合的代码示例。我们可以看到在 readme 文件中,它为我们提供了学习路径。我们要学什么,如何使用本指南,重要的是,我们可能需要的时间估计。我们可以看到这里,它为我们创建了 README、资源,学习路径仍在进行中。在我们的资源中,我们要有 MinerU 的链接和参考资料。所以让我们看看那个。

在我们的资源中,我们有文档、仓库、PyPI 包以及该库底层的论文。我们有快速入门指南、文档和相关项目。随着我们从社区引入更多信息。我们在各种不同的文章和新闻报道中进行了各种深入研究。现在我们可以看到学习路径已经创建,是时候创建代码示例了。让我们看看这个学习路径。从概述和动机开始,它解决了什么问题?我们描述了该库的起源故事,之前存在什么,以及那些库的一些问题。我们可以看到这是一个相当深入的指南和学习路径,你可以想象这是一件会持续很长时间的事情,因为你开始了解很少的东西,直到成为使用该库的专家。

我们要进入该库后端的一些独特功能,一直到代码示例以及如何尽可能高效地使用它的许多不同特征。我们可以开始看到正在编写用于 hello world 示例、概念和模式的代码文件。对于我们的 hello world 示例,我们要有一个很好的 README 来开始一些第一步,简单的提取以查看如何开始使用此库,以及安装步骤。如果我们想使用特定的库进行安装或模式,我们总是可以将那个添加到我们的技能中。但现在,这将给我们一个很好的开始来运行这个库。

当我们看一些核心概念时,那些目前正在创建。现在那些已经完成了,我们可以在 README 中看到接下来去哪里。一旦我们要运行库,我们就可以开始查看该库拥有的一些基本概念,以及比较不同后端的不同速度。最后,我们将使用这第三个文件夹创建实用模式和示例。我们可以看看这个文件夹,我们可以在这里看到我们有真实世界的处理管道和生产用例。这包括某些模式的示例,以及使用此库的相当深入的代码示例。带有文档字符串、注释以及充分利用此库所需的一切。

我们将通过验证和创建一个摘要文档来结束,确保一切都已正确完成。我们可以看看输出,它为我们提供了一个完整的学习指南、我们在技能中指定的目录结构、带有我们要请求的级别的学习路径,然后是关键功能和快速入门以运行起来。我们要在这里做的最后一件事是将这个特定的文件 resources.md 写入 Notion 中的 resources 子页面。这个页面已经存在,所以让我们看看它是什么样子的,然后我们将提示继续使用我们的 MCP 服务器进行必要的写入。

我们可以看到在 Notion 的这个学习部分下,我有一个名为 resources 的子页面。这里的目标是使用 MCP 服务器将我们在 resources.md 中拥有的内容填充到这里。所以让我们继续要求我们的智能体将该文件写入 Notion 中的那个子页面。我们要明确我们在 Notion 中使用的工具,并允许它使用我们可用的工具。我们找到了 resources 页面。我们要读取 resources.md 并使用丰富的 Notion 块将其转换为 Notion 中的正确格式。你可以看到这里我们使用了来自 Notion 的多个工具,分批进行,添加快速入门指南、API 文档以及 resources.md 中的其余信息。我们可以看到在 resources 文件中,它正在根据 resources.md 中的文档动态更新。随着这个结束,我们将看到该文件中的所有内容都出现在我们的 Notion 页面上。

现在它完成了,让我们去看看我们的 Notion 页面是什么样子的。我们可以看到这里,我们要有我们的官方文档、我们的教程、视频资源、社区频道,所有来自那个 Markdown 文件的数据,我们要现在已经写入 Notion。我们利用了技能、MCP 服务器、智能体和子智能体,全部使用 Agent SDK。你可以想象叠加额外的技能以实现更复杂的工作流或额外的子智能体以执行各种任务。我们才刚刚开始触及功能的表面,仍然有一些我们应该注意的安全问题。首先,我们允许像 write 和 bash 这样的命令在无需用户许可的情况下执行。接下来的步骤是构建一个像 Claude Code 这样的界面,确保我们要允许用户确认他们想要为某个动作使用那些特定工具。我们也才刚刚开始触及甚至为我们的智能体和子智能体添加像中断(interrupts)这样的功能的能力,类似于 Claude Code。所以我们已经为你提供了继续构建强大的智能体应用程序的基础,我们要迫不及待地想看看你接下来会构建什么。

English Script

In this final lesson, we’ll create a research agent using the Claude Agent SDK. The agent will use a skill to create a learning guide for an open source tool based on its documentation, GitHub repo, and web search. Let’s go. Now that we’ve seen how to use skills on the web with Claude using the messages API and Claude Code, let’s talk about how to use skills with the Claude Agent SDK. As a refresher, the Claude Agent SDK is a programmatic way of building your own agentic applications that use the same internal harness that Claude Code does.

What we’re going to be building here is a general purpose research agent. The main agent is going to be able to research information from multiple sources and synthesize a summary. It will dispatch three different subagents for analyzing documentation, analyzing and downloading repositories, and researching information by searching the web. Let’s take a look at those prompts, and then we’ll take a look at a Skill that’s used to guide the Main Agent with a research methodology and what needs to be extracted and synthesized.

To start, we have our main agent prompt. This is the orchestrator that has access to three available subagents with the following capabilities: finding information from documentation, analyzing repository structures, finding articles, videos, and community content to bring it all together. In this particular application, we mention that if the Skill is provided, we want it to follow a particular pattern. It’s possible that Skills may or may not be provided for the application we’re building, but in our case we’re going to provide one. If the skill matches the user’s request, we need to follow that skill’s instructions precisely.

Since we’re starting from scratch with this agentic application, we want to be very intentional about what to do when skills are provided or when they’re not. As we continue, we have a couple high level delegation guidelines for how to spawn subagents and after receiving results, how to synthesize all of those pieces of information. Let’s briefly dive into some of the prompts for our subagents. For the documentation researcher, we’ll have access to WebSearch and WebFetch. We provide a process to locate documentation. particular input formats, guidelines, and an output to return findings in a certain way.

For the repository analyzer, we also provide WebSearch to find repositories, Bash commands to clone and run git, and then the ability to read and find files and data within files. Similarly, we provide a process, an input format, guidelines, and an output. Finally, our Web Researcher makes use of WebSearch and WebFetch as well. This allows us to search for content relevant to that topic and to receive extraction instructions as well from the main agent. We also provide guidelines as well as an output format that’s necessary, and if no output format is specified, follow a default structure. All of these prompts will be used together when we set up the code necessary to make our agent work.

Finally, let’s talk about the skill we’re going to be using here. We have a skill named learning-a-tool. The purpose of this skill here is to guide the main orchestrator. We will not be using the skill in our individual subagents, but we’re using this skill as a way to create a predictable pattern so that the main agent knows the ideal workflow and what and how to dispatch subagents. We give the skill a name and a description. And in this case, we want to create learning paths for programming tools, define what information should be researched, and specify how best to follow an approach to researching all the way towards creating a comprehensive learning path.

To start, we have a very particular workflow that we have here. We start with a research phase and we specify for official documentation for that subagent exactly what to look for. For the repository analyzer, a similar kind of approach. And for our web researcher, very similar as well. So we’re using this skill to provide a constant and predictable workflow for how best to work alongside the subagents that the main agent has. Once that data is given to us, we then organize that content into progressive levels. Here, we’re using progressive disclosure to lean into loading another markdown file as the source of truth. In our progressive learning file, we can see there’s quite a bit around the individual levels that we want. starting from an overview and motivation, all the way towards installing core concepts, practical patterns, and then where to go next.

This progressive learning allows us to build levels so that we know how to start from the beginning and know eventually where to go deeper. While this initial skill is useful for learning a tool, you can also imagine that we might have additional skills for maybe comparing one tool with another depending on the data that we’re working with. As we move towards the additional phases of working with this skill, we take that data and specify a structure, and then specify an output. We’re very, very particular with the exact format that we’re working with. The goal here is to get access to a learning environment that gives us an overview, resources, a path, and code examples. The goal here is to combine the research from all of our subagents into a particular output format that we want and do that with consistency and predictability.

Now that we’ve seen at a high level of the application we’re going to build, there’s one last piece that we’ll layer on. We can imagine that we want to take the output and write it to a centralized place that we can share with teammates that might have a nicer interface. And to do that, we’re going to use Notion. to connect to Notion, we’re going to use an MCP server and bring in the tools necessary to go ahead and execute that. Now that we’ve examined the underlying prompts for our Main Agent and Subagents, as well as the Skill we’re going to be using, Let’s go ahead and begin by running uv init and initialize a project and add the necessary dependencies like claude-agent-sdk, python-dotenv, and asyncio.

Once we’ve installed these dependencies, let’s go ahead and create a file called agent.py. So I’ll go ahead, make a new file, call that agent.py. Inside of our agent.py, I’m going to be adding the necessary code To just get started with a small example using the Claude agent SDK. The boilerplate here brings in asyncio to run this environment, dotenv to load environment variables, And then from our utils, the display_message function. Just to give some context, display_message gives us a bunch of helpers for truncating and formatting input, and it gives us a nice way to visually display information from the main agent and the subagent. This is very similar code to what we saw when we worked with the API and we got that nice output for what’s happening in each tool action and iteration.

To start, we set up our Claude agent. We pass in a system_prompt here. This is going to change. We pass in allowed_tools. This is also going to change, but we just want to start with the basics here. To get started with a simple conversation, we set up a loop. accept some user input, run that through our model, and take back the response and send that back to the user. Let’s go and see what that looks like. I’m going to open the terminal again. And we’re going to go ahead and run uv run agent.py. This is going to provide a terminal environment to us where we can start a conversation. I’ll just start by asking, how are you? In this case, I’m not going to get a ton of valuable information here because I just have a helpful assistant. So what we’re going to start layering on now is the ability for our agent to get access to MCP servers and the correct tools as well.

Let’s go ahead and make some modifications to our main function. Like we mentioned, the allowed_tools are going to change. So what we’re going to start doing is adding the tools that our subagents need to use so that they can be working as expected. Read-only tools like read and Grep and Glob are allowed by default. But when we want to start doing things like writing files, searching the web, and executing commands by bash, we need to pass that in explicitly. So we’ll bring in the Write tool, the Bash tool, and our WebSearch and WebFetch tools. We saw previously that our subagents are going to be making use of these particular tools. Our agent that’s analyzing repositories needs Bash to run git commands and writing files, and our docs researcher and web researcher will make use of searching and fetching.

Now that we brought in these tools, the next thing we’re going to add are mcp_servers to connect to. We’ll use the mcp_servers keyword argument and specify the name of the MCP server, which in our case is notion. We’re going to pass in some default configuration. And we’re going to specify the command to run the notion server alongside a notion environment variable that we have. So we’re going to make sure before we go ahead from our .env file, load in our notion token and import the OS module to make sure that we can read that file correctly. Now that we’ve loaded our MCP server correctly, we need to make use of the tools that Notion provides. If we would like, we can ask Claude right now, what are all the tools that you get from this MCP server? Or, we can go ahead and add those explicitly.

by using mcp, the name of our server, followed by the name of the tool. In this case, we’re going to be using all of the tools that Notion provides to us. We need to make sure that this mcp_notion_* exists in allowed_tools so that we can give the main agent permission to use this set of tools. We can explicitly add the name of the tool or in our case, we’re just going to include all the tools available that mcp_notion provides to us. Now that we’ve set up our mcp_servers and our allowed_tools, let’s go ahead and bring in our subagents and definitions for them. We mentioned that our system_prompt is going to change. And to start, we’re going to go ahead and load all of the prompts that we have. We’ll bring in a constant and a helper function that we have to load all of these prompts. We’re going to go ahead and call that function to go ahead and bring in these prompts. inside of our main function.

We’re going to make use of these markdown files to load in the text necessary and pass them to our agent options. Before we go ahead and update the main agent, we’re going to add a dictionary that references all of our agents with a definition. We’re bringing in the AgentDefinition class, which we’ll want to make sure we import correctly. We can see in the AgentDefinition, we have a description for our subagent, a prompt that specifies the instructions for the agent, and then the tools that we want that agent to use. similar configuration to what we did in Claude code. You can see here, we still need to use our main_agent_prompt as well as this dictionary of agents. So we’ll update our system_prompt with the main_agent_prompt. and then we’ll make sure to pass in an additional keyword argument of agents that references our dictionary with our agent definitions.

As you can see here, our researcher, our analyzer, and our web researcher are using tools that we’ve defined here as well. It’s important to make sure that you list all of the tools that your main agent and your subagents will need to use inside of your allowed tools. or else your subagents will not allow them even if you include the tools here. Now that we’ve set up our agents, we need to make sure we also include the all-important Task tool to make sure that we can dispatch subagents and assign tasks to them. last piece we need to add here are skills. And the good news is, in order to add skills, there’s just one more tool that we need to add. And that is the Skill tool.

Since we have an environment here where there’s a file system and the ability to execute code using the Bash tool, All we need to add is this Skill tool so that we can correctly read skills and understand how best to use them. Similar to Claude code, skills are defined inside of a .claude folder followed by a folder called skills. Make sure your markdown files are SKILL.md and your folder is called skills in the plural. Now that we’ve added the tool for working with skills, there’s one more keyword argument that we need to pass in here. We need to specify where we find this particular set of skills. And we do so with a keyword argument called setting_sources. And here we’re going to specify that we want to find skills inside of the user directory, if we have skills in our home directory, as well as project, which is where we’ve loaded the skills for this particular application.

Now that we put this all together, let’s go ahead and test out our agent. We open up the terminal again. I’m going to go ahead and exit and let’s run this application again with the changes that we’ve made. We’re going to start by learning a little bit about MinerU. For those of you not familiar, MinerU is an open library for PDF extraction. And the reason we’re using this example, is because this is not something that Claude might know a ton of from its initial training data. This is going to require external research, analyzing code repositories, community documents, and other sources. We’re going to ask to create a learning guide and then show me the plan first.

Here, we’re going to start to see that the skill is invoked and the input here is that skill called learning-a-tool with the args that we specified. So here, we can see that we first specified the plan. We still have to go ahead and run what the subagents are going to do, but just like with Claude code and plan mode, we might want to see what the plan is before we start acting and consuming tokens and taking time. We can see the research phase of parallel investigation with our different researchers. We can see the structure necessary according to the skill, and then finally, the output that we’re expecting. This looks like a good plan. So we’ll just go ahead and ask it to proceed.

It’s going to start by spawning the docs_researcher subagent, spawning the repo_analyzer and web_researcher, and executing these in parallel, using the tools that we’ve added under the allowed tools that we’ve also passed in to our subagent. We can see in parallel, the docs researcher is heading to the documentation, the repo analyzer is looking on GitHub, and the web researcher is searching across tutorials and YouTube guides. We’re extracting the information from GitHub repositories using the bash commands, while at the same time, searching a YouTube channel for video demonstrations. These agents are interacting in parallel, fetching from different data sources to bring this all together into a compelling tutorial.

Now that the subagents have finished completing their work, We’re going to create the comprehensive guide, pull together all the necessary files based on this research that we have here. As instructed in the repository analyzer, we’ve cloned the repository for MinerU, and keeping that here. and we started to build the folder structure for learning this. We can see here, we have our readme and resources, as well as code examples that are being put together. We can see here in the readme file, it provides the learning path to us. What we’re going to learn, how to use this guide, and importantly, time estimates that we might need. We can see here, it’s created for us the README, the resources, and the learning path is still in progress. Inside of our resources, we have links and references for MinerU. So let’s take a look at that.

Inside of our resources, we have documentation, the repository, PyPI package, and the paper underlying this library. We have quick start guides, documentation, and related projects. as we pull in additional information from the community. We have all kinds of deep dives across a variety of different articles and news coverage. Now we can see that the learning path has been created and it’s time to create the code examples. Let’s take a look at this learning path. starting with an Overview & Motivation, What Problem does it Solve? We describe the Origin Story of the library, What Existed Before, and some of the problems with those libraries. We can see here this is quite an in-depth guide and learning path, and you can imagine this is something that would last a long time as you start to get up and speed knowing very little to becoming an expert in working with this library.

We move into some of the distinct features of the back end of this library, all the way to code examples and many different characteristics for how to use this as efficiently as possible. We can start to see that our code files are being written for hello world examples, concepts, and patterns. For our hello world example, we’ve got a nice README to get started with some first steps, simple extraction to see how to start using this library, as well as installation steps. If there were particular libraries we wanted to use for installing or patterns here, we could always add that to our skill. But right now, this is going to give us a great start to get up and running with this library.

as we look at some of the core concepts, those are being created currently. Now that those are done, we can see in the README where we go next. Once we’ve gotten up and running with the library, we can start to look at some of the fundamental concepts that the library possesses, as well as comparing different speeds across backends. Finally, we’re going to create practical patterns and examples with this third folder. We can take a look at this folder, and we can see here that we have real-world processing pipelines and production use cases. This includes examples for certain patterns, as well as quite in-depth code examples using this library. with doc strings, comments, and everything necessary to use this library to its fullest extent.

We’ll wrap up by validating and creating a summary document, making sure everything has been done correctly. We can take a look at the output, which gives us a complete learning guide, the directory structure as specified in our skill, the learning path with the levels that we requested, and then key features and a quick start to get up and running. The final thing we’re going to do here is write this particular file, the resources.md, to a resources subpage in Notion. This page already exists, so let’s take a look at what that looks like, and then we’ll prompt to go ahead and use our MCP server to do the writing necessary.

We can see here in Notion under this learning section, I have a sub page called resources. The goal here is to use the MCP server to populate what we had in our resources.md to this right here. So let’s go ahead and ask our agent to write that file to that sub page in Notion. We’re going to be explicit with the tools that we use in Notion and allow it to use what we have available. We’ve found the resources page. We’re going to read the resources.md and convert it to the correct format in Notion using rich Notion blocks. You can see here we’re using multiple tools from Notion, doing this in batches, adding the quick start guides, API Documentation, and the rest of the information inside of our resources.md We can see here in the resources file, it’s dynamically updating based on the documentation in our resources.md And as this finishes up, we’re going to see all the content from that file appear on our Notion page.

Now that it’s finished, let’s go take a look at what our Notion page looks like. We can see here, we’ve got our official documentation, our tutorials, video resources, community channels, all the data that came from that markdown file, we’ve now written to Notion. We made use of skills, MCP servers, agents, and subagents all using the agent SDK. You can imagine layering on additional skills for more complex workflow or additional subagents to perform a variety of tasks. We’ve just started to scratch the surface with functionality, and there’s still some security concerns that we should be mindful of. For starters, we’re allowing commands like write and bash to be executed without requiring permission from the user. The next step here is to build an interface just like Claude code that ensures that we allow the user to confirm that they want to use those particular tools for a certain action. We’ve also just started to scratch the surface with the ability to even add things like interrupts for our agents and subagents, similar to Claude code. So we’ve given you the foundation to continue to build powerful agentic applications, and we can’t wait to see what you build next.