「吴恩达Agentic AI模块5」高度自主的智能体AI模式
「吴恩达Agentic AI模块5简报」高度自主的智能体AI模式
概要
本文档综合分析了构建高度自主的AI智能体的核心设计模式,重点阐述了规划(Planning)和多智能体系统(Multi-Agent Systems) 这两种前沿方法。这些模式旨在让AI超越预设的指令序列,能够灵活地自主决策以完成复杂任务。
核心洞察包括
- 规划模式:此模式赋予大语言模型(LLM)根据用户请求和可用工具集,自主生成多步骤行动计划的能力。这种方法极大地增强了AI的灵活性和处理复杂任务的范围,但同时也降低了开发者对系统运行时行为的直接控制和可预测性。
- 规划的实现方式:为确保计划能被可靠执行,应采用结构化格式。其中,通过代码执行进行规划是最为强大的方法,它允许LLM利用其在编程语言(如Python)和大型代码库(如Pandas)上的丰富训练数据来构建复杂计划。相比之下,JSON和XML也是可靠的结构化格式,优于易产生歧义的Markdown或纯文本。
- 多智能体系统:此模式借鉴了人类团队协作的理念,将一个复杂的宏观任务分解为多个子任务,并分配给具有特定角色和工具的专业化智能体。这种方法不仅简化了复杂系统的开发过程,还促进了智能体模块的复用性。
- 智能体通信模式:多智能体系统的协作效率取决于其通信模式。目前最常见的两种模式是线性模式(按顺序依次执行)和层级模式(由一个“管理者”智能体协调多个“工作者”智能体)。此外,还存在更复杂但使用较少的深度层级模式和高度不可预测的“全体对全体”实验性模式。
核心设计模式:规划
规划是实现高度自主智能体的关键设计模式,它使智能体能够摆脱硬编码的步骤束缚,根据具体情境动态制定行动方案。
概念与工作流程
规划模式的核心思想是让LLM扮演决策者的角色,而非仅仅是指令的执行者。其工作流程通常如下:
- 接收输入:向LLM提供用户的原始请求以及一个包含可用工具及其功能的列表。
- 生成计划:提示LLM根据用户目标,返回一个分步执行的计划。
- 顺序执行:系统逐一执行计划中的每个步骤。每一步的输出结果都会作为上下文信息,被传递给下一步,以确保任务的连贯性。
这种模式的显著优势在于,开发者无需预先预测并编码所有可能的工具调用序列,从而使智能体能够应对更加多样化和复杂的请求。
应用案例
- 电商客服智能体:假设一个太阳镜零售店的客服智能体被问到:“你们有100美元以下的圆形太阳镜库存吗?” 该智能体可以规划出如下步骤:
- 使用get_item_descriptions工具筛选出所有描述为“圆形”的太阳镜。
- 将筛选结果传入check_inventory工具,确认哪些有库存。
- 最后使用get_item_price工具,检查有库存的圆形太阳镜价格是否低于100美元,并最终回答用户。
- 邮件助手:对于指令“请回复纽约Bob的邮件,告诉他我会参加,然后把他的邮件归档”,智能体可以生成计划:
- 使用search_email工具找到来自Bob的、提及“纽约”和“晚餐”的邮件。
- 生成并发送一封确认参加的回复邮件。
- 使用move_email工具将原邮件移动到归档文件夹。
现状与挑战
规划模式在某些领域已取得显著成功,但在其他领域的应用仍在发展中。
- 成功应用:在高度智能化的软件编码系统中,该模式表现出色。这些系统可以为复杂的软件开发任务制定详细的构建计划(如先构建组件A,再构建组件B,并进行测试),然后像执行清单一样逐步完成。
- 挑战与实验性:在其他行业,规划模式的应用尚不普及。主要挑战在于可控性和可预测性的降低。由于开发者不直接规定执行路径,系统的运行时行为变得难以预料,这给系统的稳定性和可靠性带来了挑战。
规划的实现与优化
为了让LLM生成的计划能够被下游代码清晰、无歧义地解析和执行,选择合适的输出格式至关重要。
结构化规划格式
要求LLM以机器可读的格式输出计划,是确保其可靠执行的前提。不同格式的可靠性存在差异。
格式 (Format) 可靠性 (Reliability) 备注 (Notes) 代码 (Code) 非常高 (Very High) 最强大的方法,直接生成可执行代码作为计划,利用现有编程语言和库的强大功能。 JSON 高 (High) 常见且可靠,通过键值对(如step, tool, arguments)清晰定义步骤。 XML 良好 (Good) 另一种可靠的结构化选项,使用标签明确定义计划结构。 Markdown 中等 (Medium) 解析时可能存在轻微歧义,不如JSON或XML严谨。 纯文本 (Plain Text) 低 (Low) 最不可靠的选项,解析困难且容易出错。
通过代码执行进行规划
这是规划模式的一种高级实现,其核心思想是让LLM直接编写代码来完成任务,而不是输出一个描述步骤的结构化文本。
- 核心理念:LLM生成的代码本身就是计划。例如,当需要分析一个电子表格数据时,LLM可以直接编写Python代码,并利用Pandas库进行数据筛选、计算和汇总。
- 主要优势:
- 克服工具限制:开发者无需为每一个可能的用户查询创建专门的工具。LLM可以灵活组合编程语言中成百上千个现有函数来解决问题,避免了工具集不断膨胀且始终无法覆盖所有边缘情况的“脆性”设计。
- 利用现有知识:LLM在其训练数据中见过海量的代码示例,因此非常擅长调用各种库函数来解决问题。
- 卓越性能:研究表明,在许多任务中,“以代码为行动”(Code as action)的性能优于生成JSON或纯文本计划。
- 安全考量:执行由LLM生成的代码存在安全风险。最佳实践是在沙盒(Sandbox)等受控环境中运行代码,以防止潜在的恶意行为。
高级模式:多智能体系统
当任务的复杂性超出一个单一智能体的处理能力时,可以采用多智能体系统,将任务分解并交由一个协同工作的智能体团队来完成。
核心理念与类比
多智能体系统的理念是将复杂的宏观任务分解为多个独立的子任务,并为每个子任务设计一个专门的智能体。
- 类比:这好比在现实世界中,与其雇佣一个“全能”的通才,不如组建一个由研究员、设计师和文案等专家组成的团队来共同完成一个营销项目。在计算机科学中,这也类似于将一个大型程序分解为多个进程或线程来并行处理。
- 优势:这种方法为开发者提供了一个强大的心智模型,有助于将复杂问题模块化,并能构建可复用的智能体组件(例如,一个通用的“图表生成”智能体可以被用于多个不同的项目中)。
构建与工作流程
以创建一个营销宣传册为例,多智能体系统的工作流程如下:
- 定义角色与工具:
- 研究员智能体:负责分析市场趋势和竞争对手。其核心工具是网页搜索。
- 平面设计师智能体:负责创作图表和视觉素材。其工具可能包括图像生成API或代码执行(用于生成数据图表)。
- 文案撰写员智能体:负责将研究结果和视觉素材整合成最终文案。它可能不需要外部工具,主要依赖LLM自身的文本生成能力。
- 构建智能体:通过特定的提示词(Prompt)为每个角色构建智能体。例如,对研究员智能体的提示可以是:“你是一个专业的市场研究员,擅长分析市场趋势和竞争对手……”
- 协调协作:定义一个清晰的通信模式,让各个智能体能够有序地协作,最终产出完整的营销宣传册。
多智能体通信模式
智能体之间的通信与协作方式是多智能体系统设计的核心。不同的通信模式决定了系统的组织结构和工作效率。
常见模式
目前,最实用且最常见的通信模式主要有两种:
- 线性模式 (Linear Pattern):这是一种简单的顺序工作流。智能体A完成其任务后,将其输出传递给智能体B,B再传递给C,以此类推。例如:研究员 -> 平面设计师 -> 文案撰写员。
- 层级模式 (Hierarchical Pattern):该模式引入了一个“管理者”角色。这个管理者智能体负责协调和分配任务给其他“工作者”智能体,并整合它们的工作成果。例如,一个“营销经理”智能体可以先后调用研究员、设计师和文案员,并对最终产出进行审核。
高级与实验性模式
除了上述两种常见模式外,还存在一些更复杂或更具实验性的模式:
- 深度层级模式 (Deep Hierarchy):这是一种多层级的组织结构。一个智能体本身也可以是管理者,负责协调其下属的子智能体。例如,研究员智能体可以管理一个“网络数据抓取”子智能体和一个“事实核查”子智能体。这种模式因其复杂性而较少使用。
- 全体对全体(All-to-All)模式:在这种模式下,所有智能体都可以随时自由地与其他任何智能体进行通信,类似于一个开放的聊天室。这种模式非常难以预测和控制,结果带有一定的“混乱”和随机性,适用于那些对最终结果的精确控制要求不高的探索性应用。
结论与展望
本文档所探讨的规划和多智能体系统设计模式,代表了构建更强大、更自主AI系统的发展方向。与反射(Reflection)和工具使用(Tool Use)等基础模式相结合,它们共同构成了现代智能体AI开发的核心技能栈。
虽然这些高级模式赋予了AI前所未有的自主性,但也带来了控制性和可预测性方面的挑战。掌握这些模式不仅能够帮助开发者构建出色的AI应用,同时也在专业领域中展现出极高的价值。随着技术的不断成熟,这些模式有望在更多领域得到应用和完善。
本文将从吴恩达(Andrew Ng)的课程中提炼出四个最具影响力的设计模式,帮助你理解如何构建这些高度自主的智能体。这些模式将彻底改变我们对AI能力的认知,并为开发者开辟了全新的可能性。
「吴恩达Agentic AI 模块5」4个自主AI智能体构建模式:从指令执行者到战略家
引言:从指令跟随者到战略思考者
我们大多数人与AI助手的互动经验,都停留在下达一次性指令并获得相应结果的模式上。我们让它写一封邮件、总结一篇文章或者回答一个简单的问题。这些AI系统虽然强大,但本质上是被动的指令执行者,等待着我们的下一步指示。
然而,AI领域正在发生一场深刻的变革。新一代的“智能体AI”(Agentic AI)正在崛起,它们不再仅仅是指令的接收者,而是能够自主规划、制定策略并执行复杂多步骤任务的战略思考者。这种AI能够像人类一样,为了达成一个宏观目标而自主地分解任务、调用工具并协同工作。
本文将从吴恩达(Andrew Ng)的课程中提炼出四个最具影响力的设计模式,帮助你理解如何构建这些高度自主的智能体。这些模式将彻底改变我们对AI能力的认知,并为开发者开辟了全新的可能性。
1. 让AI自己写“待办清单”:规划模式 (Planning Pattern)
规划设计模式的核心思想很简单却极具颠覆性:开发者不再需要为AI硬编码任务执行的每一步顺序,而是让大语言模型(LLM)自己生成一个实现目标的详细步骤计划。
以一个太阳镜零售店的客服智能体为例。当顾客提出一个复杂问题,比如“你们有库存的圆形太阳镜吗?价格要低于100美元。”在规划模式下,智能体会首先生成一个行动计划,可能包含以下步骤:获取商品描述以筛选出圆形太阳镜,然后检查库存确认是否有货,最后获取商品价格以判断是否低于100美元。接着,系统会条不紊地执行这个计划,将第一步的输出(例如,圆形太阳镜的列表)作为第二步的输入,再将第二步的结果(有库存的圆形太阳镜)作为第三步的输入,最终整合所有信息,生成给用户的最终答复。
这种转变意义重大。它赋予了AI极大的灵活性和自主性,使其能够处理各种预料之外的任务组合,而无需开发者为每一种可能性都预先设计工作流。这标志着我们从“编程”AI的行为转向“引导”AI自主规划其行为。
“构建能够自我规划的智能体,最酷的一点在于,你无需预先硬编码大语言模型为完成复杂任务可能采取的确切步骤顺序。”
不过,作为 strategist,我们也必须认识到该模式的现状。目前,除了在自主编程等领域应用得非常成功外,规划模式在许多其他应用中仍处于“实验阶段”,尚未得到非常广泛的使用。其主要挑战在于可控性——由于开发者在运行时无法预知AI会生成什么样的计划,这使得系统行为变得“难以控制”。尽管如此,这项技术仍在不断成熟,潜力巨大。
2. 终极规划是代码:让智能体为自己编程
传统的基于工具的方法有一个明显的问题。例如,在处理一个包含咖啡销售数据的电子表格时,如果用户的查询越来越复杂,开发者可能会发现自己需要创建无数个特定的工具(如“获取最大值”、“筛选行”等)。这种方法不仅效率低下,而且非常脆弱,无法应对无穷无尽的新需求。
一个更强大且反直觉的解决方案是:让LLM直接编写并执行代码(例如,使用Python及其pandas库)来解决用户的查询。这种“代码即行动”(code as action)的模式,将规划提升到了一个全新的维度。
这种方法之所以如此高效,是因为LLM在其训练数据中已经学习了数千个编程语言和库函数的用法。通过编写代码,AI可以从这个庞大的函数库中自由组合,创造出比简单地串联几个预定义工具远为丰富和复杂的执行计划。研究论文也证实,让模型通过编写代码来制定计划,其性能始终优于那些用JSON或纯文本制定计划的模型。
“通过让你的大语言模型编写代码,它可以从成百上千个它已经见过大量数据并知道何时使用的相关函数中进行选择。这使得它能够从这个非常庞大的库中将不同的函数调用串联起来,从而为回答像这样相当复杂的查询制定出计划。”
3. 告别“超级智能体”,构建AI“团队”
当我们面对一个极其复杂的任务时,与其试图构建一个无所不能的“超级智能体”,不如借鉴现实世界中的团队协作模式。这个核心理念就是“多智能体工作流”(multi-agent workflow)。
开发者可以像招聘一个人类团队一样,将一个大任务分解成多个专门的角色。以制作市场营销手册为例,你可以创建三个独立的智能体:一个研究员智能体,负责分析市场趋势;一个平面设计师智能体,负责生成图表和视觉素材;以及一个写手智能体,负责整合信息并撰写文案。每个智能体都有明确的分工和专属的工具集,例如,研究员使用网络搜索API,而设计师则调用图像生成API。
这种思维框架对开发者极为有用。它将一个庞大、令人望而生畏的问题分解为一系列更小、更易于管理的子任务。同时,这也使得构建高度专业化、可复用的智能体成为可能。
“……如果你有一个复杂的任务要执行,有时,与其思考如何雇佣一个人来为你完成,你可能会考虑雇佣一个团队,由几个人来为你完成任务的不同部分。”
4. 设计AI的“组织架构图”:从流水线到创意协作
一旦你拥有了一个AI“团队”,下一步就是设计它们之间的沟通与协作方式,这就像为一家公司设计组织架构图一样关键。以下是几种常见的沟通模式:
- 线性模式 (Linear): 这是最简单的模式,如同“流水线”。一个智能体的输出直接成为下一个智能体的输入。例如,研究员完成报告后交给设计师,设计师完成素材后交给写手。
- 层级模式 (Hierarchical): 这种模式类似“经理领导的团队”。一个中心的“经理”智能体负责协调和分派任务给其他下属智能体。这创造了一个强大的四智能体系统:研究员、设计师和写手是执行者,而第四个智能体——营销经理——本身不执行核心任务,而是负责统筹和调度其专业团队的工作。
- 全体对全体模式 (All-to-All): 这是一种更具实验性的模式,可以看作是一场“协作头脑风暴”。在这个模式下,任何智能体都可以在任何时候与其他任何智能体进行交流。
全体对全体的模式虽然可能激发强大的创造力,但也存在明显的权衡。正如吴恩达所指出的,这种模式的结果“混乱”且“难以预测”。因此,它更适合那些能够容忍一定不可预测性、追求创新而非稳定输出的应用场景。
“在实践中,我发现全体对全体沟通模式的结果有点难以预测……对于那些你愿意容忍一点混乱和不可预测性的应用,我确实看到一些开发者在使用这种沟通模式。”
结语:未来属于智能体 (Agentic)
我们正在经历一个根本性的思维转变:从简单地向AI“提问”,到设计和管理一个由能够自主规划、编程和协作的智能体组成的自治系统。这四个模式——规划、代码即行动、多智能体团队和沟通架构——是构建下一代AI应用的基础。
随着这些智能体系统变得日益普遍,我们的角色将如何从AI的“使用者”演变为智能AI团队的“架构师”与“管理者”?
「吴恩达Agentic AI 模块5」高度自主智能体AI模式学习指南
本学习指南旨在帮助您深入理解并掌握构建高度自主智能体AI的核心设计模式。内容涵盖了规划工作流、多智能体系统以及它们在实际应用中的实现方式。
测验
简答题
请使用2-3句话回答以下问题,以检验您对核心概念的理解。
- 在智能体AI中,“规划”设计模式的核心思想是什么?
- 为什么让大型语言模型(LLM)以JSON或XML等结构化格式输出其计划是有益的?
- 什么是“通过代码执行进行规划”?在什么情况下它特别有效?
- 根据源材料,使用规划模式的一个主要挑战或缺点是什么?
- 什么是多智能体工作流,其背后的核心理念是什么?
- 在多智能体系统中,单个智能体通常是如何被创建的?
- 请描述多智能体系统中的“线性”沟通模式。
- 请描述多智能体系统中的“层级式”沟通模式。
- 根据所提供的研究,将代码作为行动(规划)与使用JSON或纯文本相比,效果如何?
- 材料中用什么类比来解释多智能体系统(即使在单台计算机上运行)的价值?
答案解析
- 在智能体AI中,“规划”设计模式的核心思想是什么? “规划”设计模式允许智能体AI灵活地自行决定完成任务所需的步骤顺序,而无需开发者预先硬编码步骤。智能体会首先生成一个多步骤的计划,然后逐一执行计划中的每个步骤,以响应复杂的用户请求。
- 为什么让大型语言模型(LLM)以JSON或XML等结构化格式输出其计划是有益的? 使用JSON或XML等结构化格式,可以使下游代码以清晰、无歧义的方式解析计划的具体步骤。这确保了计划的每个部分(如步骤描述、使用的工具和参数)都能被系统可靠地、系统化地逐一执行。
- 什么是“通过代码执行进行规划”?在什么情况下它特别有效? “通过代码执行进行规划”是指让LLM直接编写代码来表达和执行一个复杂的计划,而不是输出JSON等格式的步骤列表。当任务可以通过编程逻辑完成时,这种方法尤其强大,因为它允许LLM利用编程语言和库中成百上千个现有函数,从而能处理比预定义工具集更广泛、更复杂的查询。
- 根据源材料,使用规划模式的一个主要挑战或缺点是什么? 使用规划模式的主要挑战是系统有时会变得难以控制。由于开发者无法预先知道智能体在运行时会生成什么样的计划,因此系统的行为会变得更难预测,这给控制带来了困难。
- 什么是多智能体工作流,其背后的核心理念是什么? 多智能体工作流是指让多个智能体协作完成一项任务,而不是依赖单个智能体。其核心理念是将一个复杂的任务分解为多个子任务,并为每个子任务指派一个具有特定角色和技能的智能体,就像组建一个人类团队来分工合作一样。
- 在多智能体系统中,单个智能体通常是如何被创建的? 单个智能体通常是通过向大型语言模型(LLM)提供特定的提示(prompting)来创建的。提示会指示LLM扮演一个特定的角色(如研究员、图形设计师或作者),并赋予其完成该角色任务所需的工具和背景信息。
- 请描述多智能体系统中的“线性”沟通模式。 线性沟通模式是一种工作流,其中智能体按顺序逐一完成其工作。第一个智能体完成任务后,将其输出传递给第二个智能体,第二个智能体再将其输出传递给第三个,以此类推,直到最终任务完成。例如,研究员先工作,然后是图形设计师,最后是作者。
- 请描述多智能体系统中的“层级式”沟通模式。 层级式沟通模式涉及一个“管理者”智能体,它负责协调和委派任务给其他多个“团队成员”智能体。管理者智能体制定计划,将具体任务分配给下属智能体,并接收它们的工作成果,然后决定下一步行动,形成一种自上而下的协调结构。
- 根据所提供的研究,将代码作为行动(规划)与使用JSON或纯文本相比,效果如何? 研究表明,让LLM通过编写代码来表达计划和执行动作,其效果优于让它编写JSON或纯文本格式的计划。总体趋势是,代码规划优于JSON规划,而JSON规划又稍好于纯文本规划。
- 材料中用什么类比来解释多智能体系统(即使在单台计算机上运行)的价值? 材料中使用的类比是计算机中的多进程或多线程。尽管计算机只有一个CPU,但开发者将工作分解为多个进程或线程,可以更容易地编写和管理复杂的程序。同样,将智能体任务分解为多个智能体,为开发者提供了一个有用的心智框架,可以更轻松地将复杂任务分解为可管理的子任务。
论述题
请思考并详细阐述以下问题,以深化您对相关概念的综合理解。
- 论述智能体AI中“规划”模式的演变过程,从简单的文本文本计划,到JSON/XML格式,再到最终的通过代码执行进行规划。分析每个阶段的优势和局限性。
- 比较并对比“规划”设计模式与“多智能体工作流”。这两种模式如何能像市场营销经理的例子那样结合起来使用?
- 分析智能体系统中“控制”与“自主性”之间的权衡关系,并具体参考规划模式和“全体对全体”(all-to-all)沟通模式中提到的挑战。
- 假设您被要求构建一个智能体系统,以自动化撰写一篇复杂研究论文的过程。请利用源材料中的概念,设计一个多智能体系统来完成此任务。定义智能体的角色、它们必需的工具,以及您将实施的沟通模式,并解释您的选择理由。
- 源材料提到,规划在“高度智能化的编码系统”中尤其成功。根据所提供的背景信息,解释为什么通过代码执行进行规划对于软件开发任务如此有效,并讨论其潜在风险以及必要的预防措施(如使用沙盒)。
关键术语词汇表
| 术语 (Term) | 定义 (Definition) |
|---|---|
| 智能体AI (Agentic AI) | 一类能够自主规划并执行一系列动作以完成复杂任务的人工智能系统。 |
| 规划 (Planning) | 一种设计模式,其中智能体首先生成一个多步骤的行动计划来响应用户请求,然后按顺序执行该计划,而不是依赖预先硬编码的指令。 |
| 工具 (Tools) | 提供给智能体的特定功能或API,使其能够执行超出LLM原生能力的任务,例如查询数据库、发送电子邮件或进行网络搜索。 |
| JSON格式 (JSON Format) | 一种轻量级的数据交换格式,用于让LLM以结构化、机器可读的方式输出其计划,以便下游代码能够清晰、无歧义地解析和执行。 |
| 通过代码执行进行规划 (Planning with code execution) | 一种高级规划技术,让LLM直接生成可执行的代码(如Python代码)来表达其计划。这使得智能体能够利用编程语言的强大功能和庞大的函数库来完成复杂任务。 |
| 多智能体工作流 (Multi-agent workflow) | 一种系统设计,其中多个具有不同角色和技能的智能体协同工作以完成一个共同的复杂任务,类似于一个人类团队的分工协作。 |
| 线性沟通模式 (Linear communication pattern) | 一种多智能体协作模式,其中信息和任务按顺序从一个智能体传递到下一个,形成一条直线式的工作流。 |
| 层级式沟通模式 (Hierarchical communication pattern) | 一种多智能体协作模式,其中一个“管理者”智能体负责协调其他多个下属智能体的工作,进行任务分配和结果汇总,形成一种类似组织架构的沟通结构。 |
| 全体对全体沟通模式 (All-to-all communication pattern) | 一种复杂的、非结构化的沟通模式,其中系统中的任何智能体都可以在任何时候与其他任何智能体进行通信。这种模式灵活性高,但结果难以预测和控制。 |
| 沙盒 (Sandbox) | 一种安全执行环境,用于运行由LLM生成的代码。它将代码的执行与主系统隔离开来,以防止潜在的恶意或不安全操作对系统造成损害。 |
模块5:高度自主代理的设计模式「Andrew Ng:Agentic AI」
5.1 规划工作流
欢迎来到最后一个模块,在这里你将学习一些设计模式,让你能够构建高度自主的代理。在使用这些模式时,你无需预先硬编码要采取的步骤顺序,代理可以更灵活地自行决定要采取哪些步骤来完成任务。我们将讨论规划设计模式,以及在本模块后面部分,如何构建多代理系统。让我们开始吧。
示例一:零售客户服务
假设你经营一家太阳镜零售店,并且你的库存中有哪些太阳镜的信息都存储在数据库中。你可能希望有一个客户服务代理能够回答像“你们有库存的圆形太阳镜吗?价格在100美元以下”这样的问题。这是一个相当复杂的查询,因为你必须查看产品描述,看哪些太阳镜是圆形的,然后查看哪些有库存,最后再看哪些价格低于100美元,才能告诉顾客:“是的,我们有经典款太阳镜。” 你如何构建一个能回答像这样以及许多其他各种客户查询的代理呢?
为了做到这一点,我们将给 LLM 一套工具,让它能够:
- 获取商品描述(比如查找不同的眼镜是否是圆形的)
- 检查库存
- 处理商品退货(这个查询不需要,但其他查询可能需要)
- 获取商品价格
- 检查过去的交易记录
- 处理商品销售等等。
为了让 LLM 弄清楚响应客户请求应该使用什么正确的工具顺序,你可能会写一个这样的提示:“你可以使用以下工具…”,然后给它每个工具(比如说六个或更多工具)的描述,接着告诉它“返回一个执行用户请求的逐步计划”。
在这种情况下,为了回答这个特定的查询,一个 LLM 可能输出的合理计划可能是:
- 首先,使用
get_item_descriptions检查不同的描述以找到圆形太阳镜。 - 然后,使用
check_inventory查看它们是否有库存。 - 再使用
get_item_price查看有库存的结果是否低于100美元。
在 LLM 输出了这个包含三个步骤的计划之后,我们可以将第一步的文本(即这里用红色写的文本)传递给一个 LLM,可能还会附加上下文,比如有哪些工具、你的用户查询是什么、以及其他背景信息,然后让 LLM 执行第一步。在这种情况下,希望 LLM 会选择调用 get_item_descriptions 来获取相应的商品描述,该步骤的输出能让它选出哪些是圆形太阳镜。
然后,第一步的输出会连同第二步的指令(即我这里用蓝色标出的指令)一起传递给一个 LLM,以执行计划的第二步。希望它会接着处理我们在上一张幻灯片中找到的两副圆形太阳镜并检查库存。
第二步的输出随后会用于另一次 LLM 调用,其中包含了第二步的输出以及第三步要做什么的指令。将这些传递给 LLM,让它获取商品价格,最后这个输出会最后一次反馈给 LLM,以生成给用户的最终答案。
在这张幻灯片中,我稍微简化了很多细节。LLM 实际编写的计划通常比这些简单的一行指令更详细,但基本的工作流程是,让一个 LLM 写出一个包含多个步骤的计划,然后让它在适当的上下文(比如任务是什么、有哪些可用工具等)中,依次执行计划的每一步。
使用 LLM 以这种方式进行规划的激动人心之处在于,我们不必预先决定调用工具的顺序,就能回答一个相当复杂的客户请求。如果客户提出一个不同的请求,比如“我想退回我购买的金色镜框眼镜,而不是金属镜框的”,那么你可以想象一个 LLM 同样能够想出一个不同的计划,根据他们之前购买的记录,通过 get_item_descriptions 找出他们买了哪些眼镜,哪些是他们想退回的金色镜框眼镜,然后可能调用 process_item_return。所以,有了一个能像这样进行规划的代理,它就能执行更广泛的任务,这些任务可能需要以许多不同的顺序调用许多不同的工具。
示例二:邮件助手
再看一个规划的例子,让我们来看一个邮件助手。如果你想告诉你的助手:“请回复纽约的 Bob 发来的那封邮件邀请,告诉他我会参加,并把他的邮件归档。” 那么,一个邮件助手可能会被赋予像这样的工具:搜索邮件、移动邮件、删除邮件和发送邮件。你可能会写一个助手提示,说:“你可以使用以下工具,…请返回逐步的计划。” 在这种情况下,LLM 可能会说,完成这个任务的步骤是:
- 使用
search_email找到 Bob 发来的那封提到“晚餐”和“纽约”的邮件。 - 然后生成并发送一封邮件以确认参加。
- 最后将那封邮件移动到归档文件夹。
鉴于这个计划看起来是合理的,你接下来会再次让一个 LLM 按部就班地执行这个计划。所以,第一步的文本(这里用红色显示)会被连同额外的背景上下文一起提供给 LLM,希望它会触发 search_email。然后,该操作的输出可以再次连同第二步的指令一起提供给一个 LLM,以发送一个适当的回复。最后,假设邮件已成功发送,你可以取那个输出,让 LLM 执行第三步,将 Bob 的邮件移动到归档文件夹。
总结
规划设计模式已经在许多高度 agentic 的编码系统中成功使用。如果你要求它编写一个软件来构建某个相当复杂的应用,它实际上可能会想出一个计划来先构建这个组件,再构建那个组件,几乎形成一个清单,然后一步步地执行这些步骤,来构建一个相当复杂的软件。
对于许多其他应用,规划的使用可能仍然更具实验性,尚未被非常广泛地使用。规划的挑战之一是,它有时会让系统有点难以控制,因为作为开发者,你并不知道在运行时它会想出什么样的计划。所以我认为,除了在高度 agentic 的编码系统中(它在那里确实工作得很好),规划在其他领域的采用仍在增长。但这是一项激动人心的技术,我认为它会不断进步,我们将在越来越多的应用中看到它。
构建能够自己规划的代理的酷炫之处在于,你不需要预先硬编码 LLM 为完成一个复杂任务可能采取的确切步骤顺序。现在,我知道在这个视频中,我以一个相当高的层次讲解了规划过程,即列出步骤列表,然后让一个 LLM 一步步地执行计划。但这到底是如何工作的呢?在下一个视频中,我们将更深入地探讨,看看这些计划的内部到底是什么样的,以及如何将它们串联起来,让一个 LLM 为你规划并执行计划。让我们在下一个视频中看一看。
5.2 创建和执行 LLM 计划
在这个视频中,我们将详细探讨如何提示一个 LLM 来生成一个计划,以及如何读取、解释和执行那个计划。让我们开始吧。
这是你在上一个视频中看到的客户服务代理的计划,我用简单的文本描述以一个较高的层次呈现了这个计划。让我们来看一看,你如何能让一个 LLM 写出非常清晰的、超越这些简单高层次文本描述的计划。
事实证明,许多开发者会要求一个 LLM 以 JSON 格式来格式化它想要执行的计划,因为这能让下游的代码以相对清晰和明确的方式解析出计划的具体步骤是什么,而且目前所有领先的 LLM 都非常擅长生成 JSON 输出。
所以,系统提示可能会这样说:“你可以使用以下工具…”,然后“以 JSON 格式创建一个逐步的计划”,你可能会足够详细地描述这个 JSON 格式,目的是让它输出一个像右边这里展示的计划。
在这个 JSON 输出中,它创建了一个列表,列表的第一个项目有清晰的键和值,说明计划的第一步有如下描述,并且应该使用如下工具,并向该工具传递如下参数。然后,计划的第二步是执行这个任务,然后使用这个工具,等等。所以,这种 JSON 格式,相对于用英语写计划,能让下游代码更清晰地解析出计划的确切步骤,以便能够可靠地一步步执行。
除了 JSON,我也看到一些开发者使用 XML,你可以使用 XML 分隔符,用 XML 标签来清晰地指定计划的步骤是什么以及步骤编号。有些开发者,我感觉比较少,会使用 Markdown,它在解析方面有时会稍微模糊一些,而我认为纯文本可能是这些选项中最不可靠的。但我认为,要么是 JSON(我这里展示的),要么是 XML,都会是要求 LLM 以明确的方式格式化计划的好选择。
就是这样。通过以 JSON 格式打开计划,你就可以解析它,并让下游的工作流更有系统地执行计划的不同步骤。
现在,在让 LLM 进行规划方面,事实证明还有一个非常巧妙的想法,能让一个 LLM 输出非常复杂的计划并可靠地执行它们,那就是让它们编写代码,并让代码来表达计划。让我们在下一个视频中看一看这个。
5.3 通过代码执行进行规划
通过代码执行进行规划,这个想法是,与其要求一个 LLM 以,比如说,JSON 格式输出一个计划来一步步执行,为什么不让 LLM 直接尝试编写代码呢?这些代码可以包含计划的多个步骤,比如调用这个函数,然后调用那个函数,再调用这个函数。通过执行 LLM 生成的代码,我们实际上可以执行相当复杂的计划。让我们来看一看你可能想在什么时候使用这个技术。
假设你想构建一个系统,根据一个包含像这样过往销售数据的电子表格,来回答关于咖啡机销售的问题。你可能会给一个 LLM 一套工具,比如:
get_column_max: 查看某一列并获取最大值(这样可以回答“最贵的咖啡是什么?”)get_column_meanfilter_rowsget_column_minget_column_mediansum_rows等等。这些是你可能给一个 LLM 的一系列工具,用来以不同方式处理这个电子表格或这些行列数据。
现在,如果一个用户问:“哪个月份的热巧克力销量最高?” 事实证明,你可以用这些工具来回答这个查询,但这相当复杂。你必须用 filter_rows 来提取一月份热巧克力的交易,然后对它做统计,然后对二月重复这个过程,算出统计数据,然后对三月、四月、五月,一直到十二月都重复一遍,然后取最大值。所以你实际上可以用一个相当复杂的过程,把这些工具串联起来,但这并不是一个很好的解决方案。
更糟糕的是,如果有人问:“上周有多少笔独立交易?” 嗯,这些工具不足以得到那个答案,所以你可能最终会创建一个新工具 get_unique_entries。或者你可能会遇到另一个查询:“最后五笔交易的金额是多少?” 那你就得再创建一个工具来获取数据以回答那个查询。在实践中,我看到一些团队,当他们遇到越来越多的查询时,最终会创建越来越多的工具,试图给 LLM 足够多的工具来覆盖某人可能对这样一个数据集提出的所有问题。所以这种方法是脆弱、低效的,我看到一些团队不断地处理边缘情况并试图创建更多的工具。
但事实证明,有一种更好的方法,那就是如果你提示 LLM 说:“请编写代码来解决用户的查询,并将你的答案以 Python 代码的形式返回”,也许用 <execute_python> 和 </execute_python> 这些 XML 标签来界定,那么 LLM 就可以直接编写代码,将电子表格加载到一个数据处理库中(这里它使用的是 pandas 库),然后它实际上是在构思一个计划。这个计划是,在加载了 CSV 文件之后,首先它必须确保日期列以某种方式被解析,然后按日期排序,选择最后五笔交易,只显示价格列,等等。但这些就是计划的第一、二、三、四、五步。
因为像 Python 这样的编程语言,在这个例子中还导入了 pandas 数据处理库,它有许多内置的函数,成百上千甚至上万个函数。而且,这些是 LLM 在何时调用方面已经看过大量数据的函数。通过让你的 LLM 编写代码,它可以从这成百上千个它已经看过大量数据知道何时使用的相关函数中进行选择,这让它能够将不同选择的函数调用串联起来,从而为回答像这样相当复杂的查询想出一个计划。
再举一个例子。如果有人问:“上周有多少笔独立交易?” 嗯,它可以想出一个计划:读取 CSV 文件、解析日期列、定义时间窗口、筛选行、删除重复行、然后计数。这个的细节不重要,但希望你能看到的是,如果你读这里的注释,LLM 大致上是在想出一个四步计划,并用你可以直接执行的代码来表达每一步,这将为用户得到他们的答案。
所以,对于那些任务可以合理地通过编写代码来完成的应用,让一个 LLM 用你可以为它执行的软件代码来表达它的计划,可以是一种非常强大的方式,让它能够编写丰富的计划。当然,我在关于工具使用的模块中提到的那个警告,即考虑是否需要找到一个安全的执行环境(如沙箱)来运行代码,这里也适用。尽管我知道,即使这可能不是最佳实践,我也知道很多开发者不使用沙箱。
最后,事实证明,用代码进行规划效果很好。从这张改编自王新宇(Xinyao Wang)等人研究论文的图中,你可以看到,对于他们研究的任务,在许多不同的模型上,“代码即行动”(即邀请 LLM 编写代码并通过代码采取行动)都优于让它编写 JSON 然后将 JSON 转换为行动或文本。你也看到了一个趋势,即编写代码优于让 LLM 以 JSON 编写计划,而以 JSON 编写计划也比以纯文本编写计划要好一些。
当然,有些应用你可能想给你的自定义工具让 LLM 使用,所以编写代码并不适用于每一个应用。但当它适用时,它可以是 LLM 表达计划的一种非常强大的方式。
这就结束了关于规划的部分。今天,规划型 Agentic AI 最强大的用途之一是高度 agentic 的软件编码器。事实证明,如果你要求一个高度 agentic 的软件编码辅助工具为你编写一个复杂的软件,它可能会想出一个详细的计划,先构建软件的这个组件,然后构建第二个组件,第三个,甚至可能计划在进行过程中测试这些组件。然后它形成一个清单,接着按部就班地执行。所以它在构建日益复杂的软件方面实际上工作得非常好。
对于其他应用,我认为规划的使用仍在增长和发展中。规划的一个缺点是,因为开发者不告诉系统具体要做什么,所以控制它有点难,而且你事先并不知道运行时会发生什么。但放弃一些这种控制,确实显著地增加了模型可能决定尝试的事情的范围。所以这项重要的技术有点前沿,在 agentic 编码(它在那里工作得很好)之外,感觉还不完全成熟,尽管我确定还有很大的发展空间。但希望你有一天能在你的一些应用中享受使用它。
这就结束了规划部分。在本模块中,我希望与你分享最后一个设计模式,那就是如何构建多代理系统。我们不是只有一个代理,而是有多个代理协同工作来为你完成任务。让我们在下一个视频中看一看。
5.4 多代理工作流
我们已经谈了很多关于如何构建单个代理来为你完成任务。在一个多代理或多 agentic 工作流中,我们转而让多个代理集合协作来为你做事。
有些人第一次听说多代理系统时会想,我为什么需要多个代理?它不就是我一遍遍提示的同一个 LLM,或者只是一台电脑吗?我为什么需要多个代理?
我发现一个有用的类比是,即使我可能在一台电脑上做事,我们也会把一台电脑上的工作分解成多个进程或多个线程。作为一名开发者,思考如何将工作分解成多个进程和多个计算机程序来运行——即使电脑上只有一个 CPU——这让我作为开发者更容易编写代码。同样地,如果你有一个复杂的任务要执行,有时,与其思考如何雇佣一个人来为你做,你可能会思考雇佣一个由几个人组成的团队,来为你完成任务的不同部分。
所以在实践中,我发现对于许多 agentic 系统的开发者来说,拥有这样的心智框架——不是问“我可能雇佣哪一个人来做某事”,而是“雇佣三四个不同角色的人来为我完成这个整体任务是否有意义”——这有助于提供另一种方式,将一个复杂的事情分解成子任务,并一次一个地为那些独立的子任务进行构建。
让我们来看一些这是如何工作的例子。
- 创建营销材料:以创建营销材料为例,假设你想推广太阳镜,你能为此制作一份营销手册吗?你可能需要团队里有一个研究员,来研究太阳镜的趋势和竞争对手提供什么。你可能还需要团队里有一个平面设计师,来渲染图表或你的太阳镜的好看图形。然后还需要一个写手,来把研究成果和图形资产整合在一起,制作成一份漂亮的宣传册。
- 撰写研究文章:或者,要写一篇研究文章,你可能需要一个研究员做在线研究,一个统计学家计算统计数据,一个主笔,然后一个编辑来完成一份润色过的报告。
- 准备法律案件:或者,要准备一个法律案件,真正的律师事务所通常会有助理、律师助理,也许还有一个调查员。
我们很自然地,因为人类团队的工作方式,可以想到各种复杂任务被分解给具有不同角色的不同个体的方式。所以这些例子说明了,复杂任务已经被自然地分解成了可以由具有不同技能的不同人来执行的子任务。
让我们以创建营销材料为例,详细看看研究员、平面设计师和写手可能会做什么。
- 研究员:研究员的任务可能是分析市场趋势和研究竞争对手。在设计研究代理时,需要记住的一个问题是,研究员可能需要哪些工具,才能就市场趋势和竞争对手的情况拿出一份研究报告。所以,一个 agentic 研究员可能需要使用的一个自然工具就是网络搜索。因为一个人类研究员,被要求做这些任务时,可能需要在线搜索才能完成他们的报告。
- 平面设计师:对于一个平面设计师代理,他们的任务可能是创作可视化图表和艺术作品。那么,一个 agentic 软件平面设计师可能需要哪些工具呢?嗯,他们可能需要图像生成和处理的 API。或者也许,类似于你在咖啡机例子中看到的,它可能需要代码执行来生成图表。
- 写手:最后,写手将研究成果转化为报告文本和营销文案。在这种情况下,除了 LLM 已经能做的生成文本的功能外,他们不需要任何工具。
在这个和下一个视频中,我将用这些紫色的框来表示一个代理。你构建单个代理的方式,就是提示一个 LLM 扮演研究员、平面设计师或写手的角色,取决于它是哪个代理的一部分。
例如,对于研究代理,你可能会提示它说:“你是一个研究代理,擅长分析市场趋势和竞争对手。请进行在线研究,为太阳镜产品分析市场趋势,并总结竞争对手的情况。” 这将让你能够构建一个研究员代理。同样地,通过提示一个 LLM 扮演一个带有适当工具的平面设计师,以及扮演一个写手,你就可以构建一个平面设计师代理和一个写手代理。
在构建了这三个代理之后,一种让它们协同工作以生成你最终报告的方式,是使用一个简单的线性顺序工作流,或者说,在这种情况下,一个线性计划。所以,如果你想为太阳镜创建一个夏季营销活动,你可能会把那个提示给研究代理。研究代理然后写一份报告说:“这是当前的太阳镜趋势和竞争产品。” 这份研究报告可以接着被提供给平面设计师,它查看研究发现的数据,并创作一些数据可视化图表和艺术作品选项。所有这些资产可以接着被传递给写手,它接着将研究成果和图形输出整合起来,撰写最终的营销手册。
在这种情况下,构建一个多代理工作流的优势是,在设计研究员、平面设计师或写手时,你可以一次只专注于一件事。所以我可以花些时间来构建我能做的最好的平面设计师代理,而也许我的合作者正在构建研究员代理和写手代理。最后,我们将它们串联起来,得到这个多代理系统。在某些情况下,我看到开发者们也开始复用一些代理。所以,为营销手册构建了一个平面设计师之后,也许我会考虑是否能构建一个更通用的平面设计师,既能帮我写营销手册,也能写社交媒体帖子,还能帮我为网页配图。所以,通过想出你可能雇佣哪些代理来完成一个任务——这有时会对应于你可能雇佣哪类人类员工来完成一个任务——你可以想出像这样的一个工作流,甚至可能构建出你可以在其他应用中选择复用的代理。
现在,你在这里看到的是一个线性计划,即一个代理(研究员)完成他的工作,然后是平面设计师,然后是写手。对于代理,作为线性计划的替代方案,你也可以让代理以更复杂的方式相互交互。
让我用一个使用多个代理进行规划的例子来说明。之前,你看到了我们如何可能给一个 LLM 一套它可以调用来执行不同任务的工具。在我将要向你展示的内容中,我们将转而给一个 LLM 调用不同代理的选项,请求不同的代理帮助完成不同的任务。
具体来说,你可能会写一个提示,比如:“你是一个营销经理,有以下代理团队可以合作”,然后给出代理的描述。这与我们用规划和使用工具所做的非常相似,只不过工具(绿色的框)被替换成了代理(这些紫色的框),LLM 可以调用它们。你也可以要求它返回一个执行用户请求的逐步计划。在这种情况下,LLM 可能会:
- 要求研究员研究当前的太阳镜趋势并报告。
- 然后它会要求平面设计师创作图像并报告。
- 接着要求写手创建一份报告。
- 然后也许 LLM 会选择最后一次审查或反思并改进报告。
在执行这个计划时,你会接着取第一步研究员的文本,执行研究,然后把它传递给平面设计师,再传递给写手,然后可能做最后一次反思步骤,然后就完成了。
对这个工作流一个有趣的看法是,就好像你上面有这三个代理,但左边的这个 LLM 实际上就像第四个代理,一个营销经理,它是一个营销团队的管理者,负责设定方向,然后将任务委派给研究员、平面设计师和写手代理。所以这实际上变成了一个由四个代理组成的集合,一个营销经理代理协调着研究员、平面设计师和写手的工作。
在这个视频中,你看到了两种沟通模式。一种是线性的,你的代理一次只执行一个动作,直到你到达终点。第二种有一个营销经理协调着其他几个代理的活动。事实证明,在构建多 agentic 系统时,你可能最终必须做出的关键设计决策之一,就是你不同代理之间的沟通模式是什么。这是一个艰难的研究领域,并且正在涌现多种模式,但在下一个视频中,我想向你展示一些让你的代理相互合作的最常见的沟通模式。让我们在下一个视频中看看。
5.5 多代理系统的沟通模式
当你有一个团队的人一起工作时,他们沟通的模式可能相当复杂。实际上,设计一个组织结构图是相当复杂的,需要试图找出人们沟通、协作的最佳方式。事实证明,为多代理系统设计沟通模式也相当复杂。但让我向你展示一些我今天看到的不同团队使用的最常见的设计模式。
1. 线性模式
在一个带有线性计划的营销团队中,首先是研究员工作,然后是平面设计师,然后是写手,其沟通模式是线性的。研究员会与平面设计师沟通,然后研究员和平面设计师或许都会把他们的输出传递给写手。所以这是一个非常线性的沟通模式。这是我今天看到正在使用的两种最常见的沟通计划之一。
2. 层级模式
两种最常见的沟通计划中的第二种,类似于你在这个例子中看到的,即使用多个代理进行规划,其中有一个管理者与一些团队成员沟通并协调他们的工作。所以在这个例子中,营销经理决定调用研究员来做一些工作。然后,如果你把营销经理想象成接收报告,然后把它发送给平面设计师,再接收报告,然后发送给写手,这将是一种层级式的沟通模式。如果你真的在实现一个层级式的沟通模式,让研究员把报告传回给营销经理,可能会比让研究员直接把结果传递给平面设计师和写手更简单。所以这种类型的层级结构也是一种相当常见的规划沟通模式的方式,即你有一个管理者协调着其他一些代理的工作。
3. 深度层级模式
为了与你分享一些更高级、使用频率较低,但有时在实践中仍会使用的沟通模式,一种是更深的层级结构。和之前一样,如果你有一个营销经理向研究员、平面设计师、写手发送任务,但也许研究员自己有两个其他的代理可以调用,比如一个网络研究员和一个事实核查员。也许平面设计师就自己工作,而写手有一个初稿风格写手和一个引文检查员。这将是一个代理的层级组织,其中一些代理可能自己会调用其他的子代理。我也看到这在一些应用中使用,但这比一层层级结构要复杂得多,所以今天用得比较少。
4. 全连接模式
最后一个模式,执行起来相当有挑战性,但我看到一些实验性的项目在使用它,那就是全连接(all-to-all)的沟通模式。在这种模式中,任何人都可以在任何时候与任何其他人交谈。你实现这个的方式是,你提示你所有的四个代理(在这个例子中),告诉它们还有另外三个代理它们可以决定调用。每当你的一个代理决定向另一个代理发送消息时,那条消息就会被添加到接收方代理的上下文中。然后接收方代理可以思考一会儿,并决定何时回复第一个代理。所以,如果它们都能在一个群体中协作,互相交谈一会儿,直到,比如说,它们中的每一个都宣布自己完成了这个任务,然后它停止交谈。也许当每个人都认为完成了,或者也许当写手断定已经足够好了,那时你才生成最终的输出。
在实践中,我发现全连接沟通模式的结果有点难以预测。所以有些应用不需要高度的控制,你可以运行它,看看会得到什么。如果营销手册不好,也许没关系,你再运行一次,看看是否会得到不同的结果。但我认为,对于那些你愿意容忍一点混乱和不可预测性的应用,我确实看到一些开发者在使用这种沟通模式。
所以,我希望这传达了多代理系统的一些丰富性。今天,也有相当多的软件框架支持轻松地构建多代理系统,它们也使得实现其中一些沟通模式相对容易。所以,也许如果你使用你自己的多代理系统,你会发现一些这些框架对于探索这些不同的沟通模式很有帮助。
这就把我们带到了本模块和本课程的最后一个视频。让我们进入最后一个视频做个总结。
5.6 结论
欢迎来到本课程的最后一个视频。感觉我们一起经历了很多,就你和我,我们在 Agentic AI 领域探讨了许多主题。让我们回顾一下。
- 在第一个模块中,我们谈到了你可以用 Agentic AI 构建哪些以前不可能实现的应用。然后我们开始看关键的设计模式。
- 我们探讨了反思设计模式,这是一个简单的方法,有时能给你的应用带来不错的性能提升。
- 然后是工具使用或函数调用,它扩展了你的 LLM 应用能做的事情,其中代码执行是一个重要的特例。
- 接着我们花了很多时间讨论评估以及错误分析,以及如何推动一个规范的构建与分析流程,从而高效地不断提升你的 agentic AI 系统的性能。这第四个模块中的一些材料,我认为在你持续构建 Agentic AI 系统的过程中,你会发现它们是最有用的,我希望会是这样很长一段时间。
- 然后在这个模块中,我们谈到了规划和多代理系统,它们能让你构建更强大,尽管有时更难控制、更难预先预测的系统类型。
所以,凭借你从本课程学到的技能,我想你现在知道如何构建很多酷炫、激动人心的 Agentic AI 应用了。当我的团队,或者我看到其他团队,面试求职者时,我发现面试官常常试图评估候选人是否具备你在这门课程中学到的大部分技能。所以我希望这门课程也能为你开启新的职业机会,并且你会做得更多。无论你是为了好玩,还是为了专业的实际应用场景而做这些事,我想你会享受你现在可以构建的这一系列新事物。
最后,我想再次感谢你花这么多时间和我在一起。我希望你能带着这些技能,负责任地使用它们,然后去创造一些酷炫的东西。
Module 5: Patterns for Highly Autonomous Agents
5.1 Planning workflows
0:03
Welcome to this final module where you learn about design patterns that lets you build
0:05
highly autonomous agents, where you don't need to hard code in advance the sequence of steps to take,
0:11
but it can be more flexible and decide for itself what steps it wants to take to accomplish a task.
0:16
We'll talk about the planning design pattern and then later in this module,
0:20
how to build multi-agent systems. Let's dive in.
0:23
Suppose you run a sunglasses retail store and have information on what sunglasses are in your
0:30
inventory stored in a database. You might want a customer service agent to be able to answer
0:35
questions like, do you have any round sunglasses in stock? They're under $100. This is a fairly
0:40
complex query because you have to look through the product descriptions to see what sunglasses
0:45
are round, then look at what is in stock, and then finally see what's under $100 in order to tell the
0:52
customer, yes, we have classic sunglasses. How do you build an agent to answer a broad range of
0:59
customer queries like this and many others? In order to do so, we're going to give LLM
1:05
a set of tools to let it get item descriptions, such as look up if different glasses are round,
1:11
check inventory, maybe process item returns, which is not needed for this query,
1:15
but we need it for other queries, get item price, check past transactions, process item sale,
1:22
and so on. In order to let an LLM figure out what's the right sequence of tools to use to respond to
1:29
the customer request, you might then write a prompt like this. You have access to the following tools
1:35
and give it a description of each of the, say, six tools or even more tools that the LLM has,
1:40
and to then tell it to return a step-by-step plan to carry out the user's request. In this case,
1:47
to answer this particular query, a reasonable plan that an LLM might output might be to first
1:53
use get item descriptions to check the different descriptions to find the round sunglasses,
1:59
and then use check inventory to see if they're in stock, and use get item price to see if the
2:03
in-stock results are less than $100. After an LLM outputs this plan with three steps, what we can do
2:11
is then take the step one text, that is this text written here in red, and pass that to an LLM,
2:18
maybe with additional context about what are the tools with your user query, with additional
2:22
background context, and pass an LLM to carry out step one. And in this case, hopefully the LLM will
2:29
choose to call the get item descriptions to get the appropriate descriptions of items,
2:34
and the output of that first step can let it select which are the round sunglasses,
2:39
and that output of step one is then passed together with the step two instructions, that
2:45
would be these instructions that I have here in blue, to an LLM to then execute the second step
2:49
of the plan. Hopefully it will then take the two pairs of round sunglasses we found on the previous
2:55
slide and check the inventory, and the output of that second step is then used to another LLM call,
3:02
where you have the output of the second step as well as the instructions of what to do for step
3:06
three. Pass the LLM to have it get the item price, and finally this output is fed back to
3:12
the LLM one last time to generate the final answer for the user. In this slide, I've simplified a lot
3:17
of details a little bit. The actual plan typically written by the LLM is more detailed than these
3:22
simple one-line instructions, but the basic workflow is to have an LLM write out multiple
3:27
steps of a plan, and then task it to execute each step of the plan in turn with some appropriate
3:33
surrounding context about what is the task, what are the tools available, and so on. And the
3:38
exciting thing about using an LLM to plan this way is that we did not have to decide in advance
3:45
what is the sequence in which to call tools in order to answer a fairly complex customer request.
3:52
If a customer were to make a different request, such as I would like to return the gold frame
3:58
glasses that I had purchased but not the metal frame ones, then you can imagine an LLM similarly
4:05
being able to come up with a different plan to figure out based on what they had purchased previously,
4:09
which glasses they had bought based on get item descriptions, where the gold frame ones they want
4:14
to return, and then maybe call process item return. So with an agent that can plan like this, it can
4:19
carry out a much wider range of tasks that can require calling many different tools in many
4:25
different orders. One more example of planning, let's take a look at an email assistant. If you
4:30
want to be to tell your assistant, so please reply to that email invitation from Bob in New York,
4:34
tell him to attend and archive his email. Then an email assistant may be given tools like this to
4:40
search email, move an email, delete an email, and send an email. And you might write an assistant
4:44
prompt saying you have access to the following tools, and again please return the step-by-step plan.
4:48
In this case, maybe the LLM will say the steps for this are to use search email to find the email from
4:54
Bob that mentioned dinner and New York, and then generate and send an email to confirm attendance,
4:59
and then lastly move that email to the archive folder. Given this plan, which looks a reasonable
5:04
one, you would then again task an LLM step-by-step to carry out this plan. So the text from the first
5:11
step, shown here in red, will be fed to the LLM with additional background context, and hopefully
5:16
it'll trigger search email. Then the output of that can be given to an LLM again with the step
5:22
two instructions to send an appropriate response. And then finally, assuming the email was sent
5:27
successfully, you can take that output and have the LLM execute the third step of moving the email
5:33
from Bob into the archive folder. The planning design pattern is already used successfully
5:39
in many highly agentic coding systems, where if you ask it to write a piece of software to build
5:45
some fairly complex application, it might actually come up with a plan to build this component, build
5:51
this component, to almost form a checklist, and then do those steps one at a time to build a
5:56
decently complex piece of software. For many other applications, the use of planning is still
6:02
maybe more experimental. It's not in very widespread use. And one of the challenges of
6:07
planning is it makes the system sometimes a little bit hard to control, because you as a developer,
6:13
you don't really know at runtime what plan it will come up with. And so I think outside highly
6:19
agentic coding systems, where it actually works really well, adoption of planning is still growing
6:24
in other sectors. But this is exciting technology, and I think it will keep getting better and we'll
6:29
see it in more and more applications. The cool thing about building agents that can plan for
6:34
themselves is you don't need to hard code in advance the exact sequence of steps an LLM may
6:39
take to carry out a complex task. Now, I know that in this video, I've gone over the planning process
6:45
at a fairly high level, with it all putting a list of steps and then tasking an LLM to carry
6:51
out the steps of the plan one step at a time. But how does this actually work? In the next video,
6:57
we'll take a deeper dive to look further into the guts of what these plans actually look like,
7:03
and how the strings together to have an LLM plan and execute the plan for you.
7:07
Let's take a look at that in the next video.
5.2 Creating and executing LLM plans
0:04
In this video, we'll look in detail at how to prompt an LLM to generate a plan,
0:04
and how to read, interpret, and execute that plan. Let's dive in.
0:08
This is a plan that you saw in the previous video for the customer service agents,
0:13
and I have presented this plan at a high level using simple text descriptions.
0:17
Let's take a look at how you can get an LLM to write very clear plans that go a little bit
0:23
beyond these simple high-level text descriptions. It turns out that many developers will ask an LLM
0:30
to format the plan it once executed in JSON format, because this allows downstream code
0:36
to parse what exactly are the steps of the plan in relatively clear and unambiguous ways,
0:42
and all of the leading LLMs are pretty good at generating JSON outputs at this point.
0:46
So the system prompt might say something like this. You have access to the following tools,
0:51
and then create a step-by-step plan in JSON format, and you might describe the JSON format
0:56
in enough detail with the goal of getting it to output a plan like that shown here on the right.
1:02
So in this JSON output, it creates a list where the first list item has clear keys and values
1:09
that say step one of the plan has the following description, and it should use the following tool
1:15
with the following arguments parsed to that tool. Then after that, step two of the plan
1:20
is to carry out this task, and then use this tool, and so on. So this JSON format, as opposed to
1:26
writing the plan in English, allows downstream code to more clearly parse out exactly what are
1:32
the steps of the plan so that it can be reliably executed one step at a time. Instead of JSON,
1:38
I also see some developers use XML, where you can use XML delimiters. You use XML tags
1:45
to clearly specify what are the steps of the plan and what step number it is.
1:50
Some developers, I feel like fewer developers, will use markdown, which is just sometimes
1:55
slightly more ambiguous in terms of how we parse it, and I think plain text is maybe the least
2:00
reliable of these options. But I think either JSON, which I'm showing here, or XML would be
2:05
good options for how to ask the LLM to format a plan unambiguously. So that's it. By opening
2:12
plans in JSON, you can then parse it and have downstream workflows execute different steps of
2:18
the plan more systematically. Now, in terms of getting LLMs to plan, it turns out there's one
2:24
other really neat idea that lets an LLM output very complex plans and get them executed reliably,
2:31
and that's to let them write code and to have code express the plan.
2:35
Let's take a look at this in the next video.
5.3 Planning with code execution
0:03
Planning with code execution is the idea that, instead of asking an LLM to output a plan in,
0:06
say, JSON format to execute one step at a time, why not have the LLM just try to write code and
0:12
that code can capture multiple steps of the plan, like call this function, then call this function,
0:17
then call this function, and by executing code generated by the LLM, we can actually carry out
0:22
fairly complex plans. Let's take a look at when you might want to use this technique.
0:27
Let's say you want to build a system to answer questions about coffee machine sales based on
0:33
a spreadsheet with data like this of previous sales. You might have an LLM with a set of tools
0:38
like these to get column max, to look at a certain column and get the maximum value,
0:44
so there's a whole answer, what's the most expensive coffee, or get column mean, filter
0:49
rows, get column min, get column median, sum rows, and so on. So these are examples of a range of
0:55
tools you might give an LLM to process this spreadsheet or these rows and columns of data
1:00
in different ways. Now, if a user were to ask which month had the highest sales of hot chocolate,
1:06
it turns out that you can answer this query using these tools, but it's pretty complicated.
1:11
You'd have to use filter rows to extract transactions in January for hot chocolate,
1:16
then do stats on that, and then repeat for February, figure out stats on that,
1:20
then repeat for March, repeat for April, repeat for May, all the way through December,
1:24
and then take the max, and so you can actually string it together with a pretty complicated
1:28
process using these tools, but it's not such a great solution. But worse, whether someone to
1:35
ask how many unique transactions were there last week, well, these tools are insufficient to get
1:39
that answer, so you may end up creating a new tool, get unique entries, or you may run into
1:45
another query, what were the amounts of last five transactions, then you have to create yet another
1:49
tool to get the data to answer that query. And in practice, I've seen teams, when they run across
1:56
more and more queries, end up creating more and more and more and more tools to try to give the
2:01
other enough tools to cover all the range of things someone may ask about a dataset like this.
2:06
So this approach is brittle, inefficient, and I've seen teams continuously dealing with edge cases
2:13
and trying to create more tools, but it turns out there is a better way, which is if you were
2:18
to prompt LLM to say, please write code to solve the user's query and return your answer as Python
2:24
code, maybe delimited with these beginning and ending execute Python XML tags, then LLM can just
2:31
write code to load the spreadsheet into a data processing library, here it's using the pandas
2:38
library, and then here it actually is coming up with a plan. The plan is, after loading the CSV,
2:44
first it has to ensure the date column is parsed a certain way, then sort by the date, select the
2:49
last five transactions, show just the price column, and so on. But these are the steps one, two, three,
2:55
and four, and five, say, of the plan. Because a programming language like Python, and in this
3:01
example, also with the pandas data processing library imported, because this has many built-in
3:07
functions, hundreds or even thousands of functions, and moreover, these are functions that the LLM has
3:14
seen a lot of data on how to call when. By letting your LLM write code, it can choose from
3:21
these hundreds or thousands of relevant functions that it's already seen a lot of data on when to
3:26
use, so this lets it string together different choices of functions to call from this very large
3:33
library in order to come up with a plan for answering a fairly complex query like this.
3:39
Just one more example. If someone were to ask, how many unique transactions last week?
3:44
Well, you can come up with a plan to read the CSV file, parse the date column, define the time
3:50
window, filter rows, drop duplicate rows, and count. The details of this aren't important, but hopefully
3:55
what you can see is, if you read the comments here, the LLM is roughly coming up with a four-step plan
4:02
and is expressing each of the steps in code that you can then just execute, and this will get the
4:08
user their answer. So for applications where the task can plausibly be done by writing code, letting
4:15
an LLM express its plan in software code that you can just execute for the LLM can be a very powerful
4:22
way to let it write rich plans. And of course, the caveat that I mentioned in the module on tool use
4:29
to consider if you need to find a safe execution environment like a sandbox to run the code, that
4:35
also applies. Although I know that even though it's probably not the best practice, I also know a lot of
4:40
developers that don't use a sandbox. Lastly, it turns out that planning with code works well.
4:46
From this diagram adapted from a research paper by Xinyao Wang and others, you can see that for many
4:53
different models for the tasks that they examined, code as action in which the LLM is invited to write
5:00
code and take actions through code, that is superior to having it write JSON and then translate
5:07
JSON into action or text. And you also see a trend that writing code outperforms having the LLM write
5:15
a plan in JSON, and writing a plan in JSON is also a bit better than writing a plan in just plain text.
5:21
Now, of course, there are applications where you might want to give your custom tools to an LLM
5:26
to use, and so writing code isn't for every single application. But when it does apply, it can be a
5:32
very powerful way for an LLM to express a plan. So that wraps up the section on planning. Today, one of
5:39
the most powerful uses of Agentic AI that plans is highly agentic software coders. It turns out that
5:47
if you ask one of the highly agentic software coding assistance tools to write a complex piece
5:52
of software for you, it may come up with a detailed plan to build this component of software first,
5:58
then build a second component, build a third, maybe even plan to test out the components as
6:02
going along. And then it forms a checklist that then goes through to execute one step at a time.
6:08
And so it actually works really well for building increasingly complex pieces of software. For other
6:14
applications, I think the use of planning is still growing and developing. One of the disadvantages
6:19
of planning is that because the developer doesn't tell the system what exactly to do, it's a little
6:25
bit harder to control it, and you don't really know in advance what will happen at runtime. But giving
6:30
up some of this control, it does significantly increase the range of things that the model may
6:35
decide to try out. So this important technology is kind of cutting edge, doesn't feel completely
6:42
mature outside of maybe agentic coding where it works well, although I'm sure there's still a lot
6:47
of room to grow. But hopefully you enjoy using it in some of your applications someday.
6:54
That wraps up planning. There's one last design pattern I hope to share with you in this module,
6:59
which is how to build multi-agent systems. We have not just one agent, but many of them
7:05
working in collaboration to complete the task for you. Let's take a look at that in the next video.
5.4 Multi-agent workflows
0:00
We've talked a lot about how to build a single agent to complete tasks for you.
0:04
In a multi-agent or multi-agentic workflow, we instead have a collection of multiple agents
0:09
collaborate to do things for you.
0:12
When some people hear for the first time about multi-agent systems, they wonder, why do I
0:16
need multiple agents?
0:18
It's just the same LLM that I'm prompting over and over, or just one computer.
0:22
Why do I need multiple agents?
0:25
I find that one useful analogy is, even though I may do things on a single computer, we do
0:31
decompose work in a single computer into maybe multiple processes or multiple threads.
0:37
And as a developer, thinking, even though it's one CPU on a computer, say, thinking
0:42
about how to take work and decompose it into multiple processes and multi-computer programs
0:47
to run, that makes it easier for me as a developer to write code.
0:52
And in a similar way too, if you have a complex task to carry out, sometimes, instead of thinking
0:59
about how to hire one person to do it for you, you might think about hiring a team of
1:04
a few people to do different pieces of the task for you.
1:08
And so in practice, I found that for many developers of agentic systems, having this
1:12
mental framework of not asking, what's the one person I might hire to do something, but
1:17
instead, would it make sense to hire people with three or four different roles to do this
1:22
overall task for me, that helps give another way to take a complex thing and decompose
1:28
it into sub-tasks and to build for those individual sub-tasks one at a time.
1:33
Let's take a look at some examples of how this works.
1:36
Take the task of creating marketing assets, say you want to market sunglasses.
1:40
Can you come up with a marketing brochure for that?
1:43
You might need a researcher on your team to look at trends on sunglasses and what competitors
1:48
are offering.
1:49
You might also have a graphic designer on your team to render charts or nice-looking
1:54
graphics of your sunglasses.
1:56
And then also a writer to take the research, take the graphic assets and put it all together
2:00
into a nice-looking brochure.
2:02
Or to write a research article, you might want a researcher to do online research, a
2:06
statistician to calculate statistics, a lead writer, and then an editor to come up with
2:10
a polished report.
2:11
Or to prepare a legal case, real law firms will often have associates, paralegals, maybe
2:16
an investigator.
2:18
And we naturally, because of the way human teams do work, can think of different ways
2:24
that complex tasks can be broken down into different individuals with different roles.
2:31
So these are examples of when a complex task were already naturally decomposed into sub-tasks
2:38
that different people with different skills can carry out.
2:41
Take the example of creating marketing assets.
2:44
Look into detail into what a researcher, graphic designer, and writer might do.
2:49
A researcher might have the task of analyzing market trends and researching competitors.
2:55
And when designing the research agents, one question to keep in mind is what are the tools
3:01
that the researcher may need in order to come up with a research report on market trends
3:06
and what competitors are doing.
3:07
So one natural tool that an agentic researcher might need to use would be web search.
3:14
Because as a human researcher, asked to do these tasks might need to search online in
3:18
order to come up with their report.
3:20
Or for a graphic designer agent, they might be tasked with creating visualizations and
3:25
artwork.
3:26
And so what are the tools that an agentic software graphic designer might need?
3:31
Well, they may need image generation and manipulation APIs.
3:36
Or maybe, similar to what you saw with the coffee machine example, maybe it needs code
3:41
execution to generate charts.
3:44
And lastly, the writer has transformed the research into report text and marketing copy.
3:49
And in this case, they don't need any tools other than what an LLM can already do to generate
3:54
text.
3:55
In this and the next video, I'm going to use these purple boxes to denote an agent.
4:00
And the way you build individual agents is by prompting an LLM to play the role of a
4:06
researcher or a graphic designer or a writer, depending on which agent it is part of.
4:11
So for example, for the research agents, you might prompt it to say, you are a research
4:15
agent, expert at analyzing market trends and competitors, carry out online research to
4:22
analyze market trends for the sunglasses product and give a summary as well of what competitors
4:26
are doing.
4:27
So that would allow you to build a researcher agent.
4:30
And similarly, by prompting an LLM to act as a graphic designer with the appropriate
4:35
tools and to act as a writer, that's how you can build a graphic designer as well as
4:41
a writer agent.
4:43
Having built these three agents, one way to have them work together to generate your final
4:49
reports would be to use a simple linear audit workflow or a linear plan in this case.
4:56
So if you want to create a summer marketing campaign for sunglasses, you might give that
5:00
prompt to the research agents.
5:02
The research agent then writes a report that says, here are the current sunglasses trends
5:06
and competitive offerings.
5:08
This research report can then be fed to the graphic designer that looks at the data the
5:13
research has found and creates a few data visualizations and artwork options.
5:17
All these assets can then be passed to the writer that then takes the research and the
5:23
graphic output and writes the final marketing brochure.
5:27
The advantage of building a multi-agent workflow in this case is when designing a researcher
5:32
or graphic designer or writer, you can focus on one thing at a time.
5:36
So I can spend some time building maybe the best graphic designer agents I can, while
5:41
maybe my collaborators are building research agents and writer agents.
5:45
And in the end, we string it all together to come up with this multi-agent system.
5:50
And in some cases, I'm seeing developers start to reuse some agents as well.
5:56
So having built a graphic designer for marketing brochures, maybe I'll think about if I can
6:01
build a more general graphic designer that can help me write marketing brochures as well
6:05
as social media posts, as well as help me illustrate online webpages.
6:10
So by coming up with what are the agents you might hire to do a task, and this will sometimes
6:16
correspond to who are the types of human employees you might hire to do a task.
6:22
You can come up with a workflow like this with maybe even building agents that you could
6:27
choose to reuse in other applications as well.
6:30
Now, what you see here is a linear plan where one agent, the researcher does his work, then
6:36
the graphic designer, and then the writer.
6:38
With agents, you can also, as an alternative to a linear plan, you can also have agents
6:44
interact with each other in more complex ways.
6:47
Let me illustrate with an example of planning using multiple agents.
6:51
So previously, you saw how we may give an LLM a set of tools that we can call to carry
6:56
out different tasks.
6:57
In what I want to show you, we will instead give an LLM the option to call on different
7:03
agents to ask the different agents to help complete different tasks.
7:07
So in detail, you might write a prompt like you're a marketing manager, have the following
7:11
team of agents to work with, and then give a description of the agents.
7:14
And this is very much similar to what we're doing with planning and using tools, except
7:19
the tools, the green boxes, are replaced with agents, these purple boxes that the LLM can
7:24
call on.
7:25
You can also ask it to return a step-by-step plan to carry out the user's request.
7:28
And in this case, the LLM may ask the researcher to research current sunglasses trends and
7:33
then report back.
7:34
Then it will ask the graphic designer to create the images and then report back, then ask
7:38
the writer to create a report, and then maybe the LLM will choose to review or reflect on
7:42
and improve the report one final time.
7:45
In executing this plan, you would then take the step one text of the researcher, carry
7:50
out research, then pass that to the graphic designer, pass it to the writer, and then
7:55
maybe do one final reflection step, and then you'd be done.
7:59
One interesting view of this workflow is as if you have these three agents up here, but
8:04
this LLM on the left is actually like a fourth agent that's a marketing manager, that is
8:10
a manager of a marketing team, that is setting direction and then delegating tasks to the
8:15
researcher, the graphic designer, and the writer agents.
8:18
So this becomes actually a collection of four agents for a marketing manager agent coordinating
8:22
the work of the researcher, the graphic designer, and the writer.
8:26
In this video, you saw two communication patterns.
8:30
One was a linear one where your agents took actions one at a time until you got to the
8:35
end.
8:36
And the second had a marketing manager coordinating the activity of a few other agents.
8:41
It turns out that one of the key design decisions you may end up having to make when building
8:46
multi-agentic systems is what is the communication pattern between your different agents?
8:51
This is an area of hard research and there are multiple patterns emerging, but in the
8:56
next video, I want to show you what are some of the most common communication patterns
8:59
for getting your agents to work with each other.
9:02
Let's go see that in the next video.
5.5 Communication patterns for multi-agent systems
0:01
When you have a team of people working together, the patterns by which they communicate can be
0:06
quite complex. And in fact, designing an organizational chart is actually pretty
0:10
complex to try to figure out what's the best way for people to communicate, to collaborate.
0:16
It turns out, designing communication patterns for multi-agent systems is also quite complex.
0:22
But let me show you some of the most common design patterns I see used by different teams today.
0:26
In a marketing team with a linear plan, where first a researcher worked, then a graphic designer,
0:31
then a writer, the communication pattern was linear. The researcher would communicate with
0:36
the graphic designer, and in both the research and the graphic designer, maybe pass the outputs to
0:40
the writer. And so there's a very linear communication pattern. This is one of the
0:46
two most common communication plans that I see being used today. The second of the two most
0:51
common communication plans would be similar to what you saw in this example, with planning using
0:58
multiple agents, where there is a manager that communicates with a number of team members and
1:05
coordinates their work. So in this example, the marketing manager decides to call on the researcher
1:10
to do some work. Then if you think of the marketing manager as getting the report back, and then
1:15
sending it to the graphic designer, getting a report back, and then sending it to the writer, this would be a
1:20
hierarchical communication pattern. If you're actually implementing a hierarchical communication
1:25
pattern, it'll probably be simpler to have the researcher pass the report back to the marketing
1:29
manager, rather than the researcher pass the results directly to the graphic designer and to
1:34
the writer. But so this type of hierarchy is also a pretty common way to plan the communication
1:40
patterns, where you have one manager coordinating the work of a number of other agents. And just to
1:45
share with you some more advanced and less frequently used, but nonetheless sometimes used in
1:50
practice communication patterns, one would be a deeper hierarchy, where same as before, if you have
1:56
a marketing manager send tasks to the researcher, graphic designer, writer, but maybe the researcher has
2:01
themselves two other agents that they call on, such as a web researcher and a fact checker. Maybe the
2:07
graphic designer just works by themselves, whereas the writer has an initial style writer and a citation
2:13
checker. So this would be a hierarchical organization of agents, in which some agents
2:19
might themselves call other sub-agents. And I also see this used in some applications, but this is
2:25
much more complex than a one-level hierarchy, so used less often today. And then one final pattern
2:31
that is quite challenging to execute, but I see a few experimental projects use it, is the all-to-all
2:38
communication pattern. So in this pattern, anyone is allowed to talk to anyone else at any time. And
2:44
the way you implement this is you prompt all four of your agents, in this case, to tell them that
2:50
there are three other agents they could decide to call on. And whenever one of your agents decides
2:55
to send a message to another agent, that message gets added to the receiver agent's contacts. And
3:02
then a receiver agent can think for a while and decide when to get back to that first agent. And
3:07
so if you can all collaborate in a crowd and talk to each other for a while until, say, each of them
3:13
declares that it is done with this task, and then it starts talking. And maybe when everyone thinks
3:18
it's done, or maybe when the writer concludes it's good enough, that's when you generate the final
3:22
output. In practice, I find the results of all-to-all communication patterns a bit hard to predict.
3:27
So some applications don't need high control. You can run it and see what you get. If the marketing
3:32
brochure isn't good, maybe that's okay. You just run it again and see if you get a different result.
3:37
But I think for applications where you're willing to tolerate a little bit of chaos and
3:41
unpredictability, I do see some developers using this communication pattern. So that, I hope,
3:47
conveys some of the richness of multi-agent systems. Today, there are quite a lot of software
3:54
frameworks as well that support easily building multi-agent systems. And they also make implementing
4:00
some of these communication patterns relatively easy. So maybe if you use your own multi-agent system,
4:06
you'll find some of these frameworks hopeful for exploring these different
4:09
communication patterns as well. And so that now brings us to the final video
4:17
of this module and of this course. Let's go on to the final video to wrap up.
5.6 Conclusion
0:04
Welcome to the final video of this course. It feels like we've been through a lot together,
0:04
just you and me, and we've gone through a lot of topics in Agentic AI. Let's take a look.
0:10
In the first module, we talked about what are the applications you can build with Agentic AI
0:15
that just were not possible before. And we started then to look at key design patterns,
0:22
including the reflection design pattern, which is a simple way to sometimes give your application a
0:27
nice performance boost, and then tool use or function calling, which expands what your LLM
0:33
application can do, with code execution being one important case of that. And then we spent a lot of
0:38
time talking about evaluations, as well as error analysis, and how to drive a disciplined process
0:44
of building, as well as analyzing, to be efficient in how you keep on improving the performance of
0:50
your agentic AI system. This fourth module is some of the material that I think you will find
0:56
most useful as you keep building Agentic AI systems, I hope, for a long time. And then in
1:01
this module, we talk about planning and multi-agent systems that can let you build much more powerful,
1:07
although sometimes harder to control, and harder to predict in advanced types of systems. So with
1:13
the skills you learn from this course, I think you now know how to build a lot of cool, exciting
1:19
Agentic AI applications. When my team, or I see other teams as well, interview people for jobs,
1:25
I find interviews often try to assess whether or not candidates have pretty much the skills you're
1:31
learning about in this course. And so I hope that this course will also open up new professional
1:36
opportunities for you, and you just will do more. Whether you're doing these things for fun, or for
1:42
professional practical settings, I think you'll enjoy this new set of things you can now build.
1:50
So, just to wrap up, I want to thank you again for spending all this time with me,
1:55
and I hope you will take these skills, use them responsibly, and just go build cool stuff.