Andrew Ng: Agentic AI

Module 1: Introduction to Agentic Workflows

1.0 Introduction

0:02
Welcome to this course on Agentic AI. When I coined the term agentic to describe what I saw
0:05
as an important and rapidly growing trend in how people were building on-base applications,
0:10
what I did not realize was that a bunch of marketers would get hold of this term
0:14
and use it as a sticker and put this on almost everything in sight. And that has caused hype
0:20
on Agentic AI to skyrocket. The good news though is that ignoring the hype, the number of truly
0:27
valuable and useful applications built using Agentic AI has also grown very rapidly, even if
0:33
not quite as rapidly as the hype. And in this course, what I'd like to do is show you best
0:38
practices for building Agentic AI applications. And this will open up a lot of new opportunities
0:44
to you in terms of what you can now build. Today, agentic workflows are being used to build
0:49
applications like customer support agents, or to do deep research to help write deeply insightful
0:55
research reports, or to process tricky legal documents, or to look at patient input and render
1:02
or to suggest possible medical diagnoses. On many of my teams, a lot of the projects we built just
1:08
would be impossible without agentic workflows. And so knowing how to build applications with them
1:14
is one of the most important and valuable skills in AI today. It turns out that one of the biggest
1:20
differences I've seen between people that really know how to build agentic workflows compared to
1:24
people that are less effective at it is the ability to drive a disciplined development process,
1:30
specifically one focused on evals and error analysis. And in this course, I'll tell you what
1:35
that means and show you what allows you to be really good at building these agentic workflows.
1:41
Being able to do this is one of the most important skills in AI today and will open up a lot more
1:46
opportunities, be it job opportunities or opportunities to just build amazing software
1:51
yourself. With that, let's go on to the next video to dive more into what are agentic workflows.

1.1 What is an agentic AI

0:03
So what is Agentic AI and why are Agentic AI workflows so powerful? Let's take a look.
0:05
The way that many of us use large language models or LLMs today is by prompting it to say,
0:11
write an essay for us on a certain topic X. And I think of that as akin to going to a human,
0:17
or in this case, going to an AI and asking it to please type out an essay for me by writing
0:22
from the first word to the last word all in one go and without ever using backspace.
0:27
It turns out that we as people, we don't do our best writing like that by being forced to write
0:32
in this completely linear order and nor do AI models. But despite the difficulty of being
0:38
constrained to write in this way, our LLMs do surprisingly well. In contrast, with an agentic
0:43
workflow, this is what the process might look like. You may ask it to first write an essay
0:48
outline on a certain topic, then ask if it needs to do any web research. And after doing some web
0:53
research and maybe downloading some web pages, then to write the first draft and then to read
0:57
the first draft and see what parts need revision or more research and then revise the draft and so
1:02
on. And this type of workflow is more akin to doing some thinking and some research and then
1:06
doing some revision and then doing some more thinking and so on. And with this iterative
1:12
process, it turns out that an agentic workflow can take longer, but it delivers a much better
1:18
work product. So an agentic AI workflow is a process where an LLM based app executes multiple
1:25
steps to complete a task. In this example, you might use an LLM to write the first essay outline
1:31
and then you might use an LLM to decide what search terms to type into a web search engine or
1:37
really what search terms to call a web search API with in order to get back relevant web pages.
1:44
Based on that, you can feed the downloaded web pages into an LLM to have it write the first draft
1:51
and then maybe use another LLM to reflect and decide what needs more revision. And then depending
1:57
on how you design this workflow, perhaps you may even add a human in the loop step where the LLM
2:03
has the option to request human review, maybe of some key facts. And based on that, it may then
2:10
revise the draft and this process results in a much better work output. One of the key skills
2:16
you learn in this course is how to take a complex task like writing an essay and breaking it down
2:24
into smaller steps for agentic workflows to execute one step at a time to then get the work output
2:31
that you want. And knowing how to decompose the task into steps and how to build the components
2:36
to execute the individual steps well turns out to be a tricky but important skill that will
2:42
determine your ability to build agentic workflows for a huge range of exciting applications. In this
2:48
course, a running example that we'll use and something that you build alongside me is a research
2:54
agent. So here's an example of what it will look like. You can enter a research topic like how do I
3:00
build a new rocket company to compete with SpaceX. I don't personally want to compete with SpaceX,
3:08
but if you want to, you can try asking a research agent to help your background research.
3:13
So this agent starts with planning out what research to use, including calling a web search
3:20
engine to download some web pages, and then to synthesize and rank findings, draft an outline,
3:26
have an editor-to-agent review for coherence, and then finally generate a comprehensive
3:30
markdown report, which it has done here, building a new rocket company to compete with SpaceX
3:35
with an intro, background, findings, and so on. I think it points out appropriately that
3:41
this is going to be a tough startup to build, so I'm not personally planning to do this, but if
3:47
you want to tackle something like this, maybe a research agent like this could help you with some
3:53
initial research. And by finding and downloading multiple sources and deeply thinking about it,
4:00
this actually ends up with a much more thoughtful report than just prompting an LLM to write an essay
4:07
for you would. One of the reasons I'm excited about this is because in my work, I've ended up
4:12
building quite a few specialized research agents, be it in legal documents for conflict legal
4:18
compliance, or for some healthcare sectors, or some business product research areas. And so I hope
4:24
that working through this example, you not only learn how to build agentic workflows for many
4:29
other applications, but that some of the ideas in building research agents will be directly useful
4:34
to you if you ever need to build a custom research agent yourself. Now, one of the often discussed
4:42
areas of AI agents is how autonomous are they? What you just saw here was a relatively complex,
4:49
highly autonomous Agentic AI workflow, but there are also other simpler workflows that are
4:55
incredibly valuable. Let's go on to the next video to talk about the degree to which agentic
5:00
workflows can be autonomous, and does it give you a framework to think about how
5:05
you might go about building different applications and how easy or difficult they might be.
5:10
See you in the next video.

1.2 Degrees of autonomy

0:02
Agents can be autonomous to different degrees. A few years ago, I noticed within the AI community
0:06
that there was a growing controversial debate about what is an agent, and some people are
0:10
writing a paper saying I built an agent, and others will say, no, that's not really a true
0:14
agent. And I felt this debate was unnecessary, which is why I started using the term agentic,
0:21
because I thought if we use it as an adjective rather than a binary, it's either an agent or not,
0:26
then we're going to have to acknowledge that systems can be agentic to different degrees.
0:31
And let's just call it all agentic and move on with the real work of building these systems
0:36
rather than debating, you know, is this sufficiently autonomous to be an agent or not?
0:40
I remember when I prepared a talk on agentic reasoning, one of my team members actually
0:46
came to me and said, hey, Andrew, we don't need yet another word. You know, we have agent,
0:50
why are you making up another word, agentic? But I decided to use it anyway. And then later on,
0:55
wrote an article in a given newsletter, The Batch, and then also posted on social media,
1:01
saying that instead of arguing over which word to include or exclude as being a true agent,
1:06
let's acknowledge that different degrees to which systems can be agentic. And I think this helped
1:11
move past the debate on what is a true agent and let us just focus on actually building them.
1:18
Some agents can be less autonomous. So take the example of writing an essay about black holes.
1:25
You can have a relatively simple agent to come up with a few web search terms or web search queries.
1:31
Then you can hard code in that you call a web search engine, fetch some web pages,
1:37
and then use that to write an essay. And this is an example of a less autonomous agent
1:41
with a fully deterministic sequence of steps. And this will work okay.
1:46
In terms of notational convention, throughout this course, I'll use the red color,
1:50
as you see here on the left, to denote the user input, such as a user query in this case,
1:56
or in later examples, maybe the input document into an agentic workflow.
2:00
The gray boxes denote calls to an LLM, and the green boxes, like the web search and the web
2:07
fetch boxes that you see here, indicate steps where other software is being used to carry out
2:13
an action, such as a web search API call or executing code to fetch the contents of a website.
2:18
Then an agent can be more autonomous, where, given a request to write an essay about black holes,
2:24
perhaps you let the LLM decide, does it want to do a web search, or does it want to search
2:29
recent news sources, or does it want to search for recent research papers on the website archive?
2:34
Based on that, maybe in this example, the LLM, not the human engineer, but the LLM chooses,
2:39
in this case, to call a web search engine, and then after that, you may let the LLM decide
2:44
how many web pages does it want to fetch, or if it fetches the PDF, does it need to
2:50
call a function, or also call a tool, to convert the PDF to text?
2:53
And in this case, maybe it fetches its top few web pages, then it can write an essay,
2:59
decide whether to reflect and improve, and maybe even go back to fetch more web pages,
3:03
and then to finally produce an output.
3:06
And so even for this example of a research agent, we can see that some agents can be
3:13
less autonomous, with a linear sequence of steps to be executed, determined by a programmer,
3:18
and some can be more autonomous, where you trust the LLM to make more decisions,
3:22
and the exact sequence of steps that happens may be even determined by the LLM, rather than
3:26
in advance by the programmer.
3:28
So for less autonomous systems, you will usually have all the steps predetermined in advance,
3:34
and any functions it calls, like web search, and we'll call that tool use, as you learn
3:38
in the third module in this course, might be hard-coded by the human engineer, by you
3:43
or me, and most of the autonomy is in what text the LLM generates.
3:47
At the end of the spectrum would be highly autonomous agents, where the agent makes many
3:51
decisions autonomously, including, for example, deciding what is the sequence of steps it
3:56
will carry out in order to write the essay.
3:59
And there's some highly autonomous agents that can even write new functions, or sometimes
4:04
create new tools that it can then execute.
4:06
And somewhere in between are semi-autonomous agents, where it can make some decisions,
4:11
choose tools, but the tools are usually more predefined.
4:14
As you look at different examples in this course, you learn how to build applications
4:18
anywhere on this spectrum of less to more highly autonomous, and you find that there
4:22
are tons of applications in the less autonomous end of the spectrum that are very valuable
4:26
being built for tons of businesses today, and at the same time, there are also applications
4:31
being worked on at the more highly autonomous end of the spectrum, but those are usually
4:36
less easily controllable, a little bit more unpredictable, and also a lot of active research
4:42
as well to figure out how to build these more highly autonomous agents.
4:46
And with that, let's go on to the next video to dive deeper into this and to hear about
4:53
some of the benefits of using agents and why they allow us to do things that just were
4:57
not possible with earlier generations of base applications.

1.3 Benefits of agentic workflows

0:03
I think the one biggest benefit of agentic workflows is that it allows you to do many
0:04
tasks effectively that just previously were not possible. But there are other benefits as well,
0:10
including parallelism that lets you do certain things quite fast, as well as modularity that
0:14
lets you combine the best of three components from many different places to build an effective
0:19
workflow. Let's take a look. My team collected some data on a coding benchmark that tests the
0:26
ability of different LLMs to write code to carry out certain tasks. The benchmark used in this case
0:32
is called Human Eval, and it turns out that GPT 3.5, this is a model that the first publicly
0:38
available version of Chat GPT was based on, if asked to write the code directly, to just type
0:42
out the computer program, gets 40% right on this benchmark. This is a positive k-metric. GPT 4 is
0:49
a much better model. Its performance leaps to 67% with this also non-agentic workflow. But it turns
0:56
out that as large as the improvement was from GPT 3.5 to GPT 4, that improvement is dwarfed by what
1:03
you can achieve by wrapping GPT 3.5 within an agentic workflow. Using different agentic techniques,
1:10
which you'll learn about later in this course, you can prompt GPT 3.5 to write code and then maybe
1:15
reflect on the code and figure out if you can improve it. And using techniques like that, you can
1:20
actually get GPT 3.5 to get much higher levels of performance. And similarly, GPT 4 used in the
1:26
context of an agentic workflow also does much better. So even with today's best LLMs, an agentic
1:33
workflow lets you get much better performance. And in fact, what we saw in this example was the
1:39
improvement from one generation of model to another, which is huge, is still not as big a difference
1:45
as implementing an agentic workflow on the previous generation of model. Another benefit of
1:51
using agentic workflows is that they can parallelize some tasks and thus do certain things much faster
1:57
than a human. For example, if you ask an agentic workflow to write an essay about black holes, you
2:04
might be able to have three LLMs run in parallel to generate ideas for web search terms to type
2:10
into the search engine. Based on the first web search, it may identify, say, three top results to
2:16
fetch. And based on the second web search, it may identify a second set of web pages to fetch and so
2:22
on. And it turns out that whereas a human doing this research would have to read these nine web
2:28
pages sequentially or one at a time, when you're using an agentic workflow, you can actually
2:33
parallelize all nine web page downloads and then finally feed all these things into an LLM to write
2:39
an essay. So even though agentic workflows do take longer than truly non-agentic workflows or by
2:45
direct generation by just prompting a single time, if you were to compare this type of agentic
2:51
workflow to how a human would have to go about the task, the ability to parallelize downloading lots
2:56
of web pages can actually let it do certain tasks much faster than the non-parallel sequential way
3:02
that a single human might process this data. To build on this example, it turns out one of the
3:08
things I often do when building agentic workflows is look at the individual components like the LLM
3:14
and add or swap out components. So for example, maybe I look at the web search engine I use up
3:20
here and I might decide that I want to soften a new web search engine. When building agentic
3:24
workflows, there are actually multiple web search engines including Google, which you can access
3:29
by a server, as well as others like Bing, DuckDuckGo, Tavily, u.com. There are actually quite a lot of
3:35
options for web search engines designed for LLMs to use. Or maybe instead of just doing three web
3:40
searches, maybe on this step we can swap in a new news search engine so we can find out what's the
3:47
latest news on recent breakthroughs on black hole science. And lastly, instead of using the same LLM
3:55
for all of the different steps, I will often try out different large language models and maybe try
4:01
out different LLM providers to see which one gives the best result for different steps of this system.
4:07
So to summarize, the main reason I use agentic workflows is it just gives much better performance
4:13
on many different applications. But in addition, it can also paralyze some tasks that humans would
4:20
otherwise have to do sequentially. And the modular design of many agentic workflows also lets us
4:27
add or update tools and sometimes swap out models. We've talked a lot about the key components of
4:33
building agentic workflows. Let's now take a look at a range of Agentic AI applications to give you
4:39
a sense of the sorts of things people are already building and the sorts of things
4:43
you'll build yourself. Let's go on to the next video.

1.4 Agentic AI applications

0:00
Let's take a look at some examples of Agentic AI applications.
0:04
One task that many businesses carry out is invoice processing.
0:09
So given an invoice like this, you might want to write software to extract the most important
0:14
fields, which for this application, let's say is the biller, that would be tech flow solutions,
0:19
the biller address, the amount due, which is $3,000, and the due date, which looks like it is
0:25
August 20th, 2025. So in many finance departments, maybe a human would look at invoices and identify
0:33
the most important fields, who do we need to pay by when, and record these in a database to make
0:39
sure that payment is issued in time. If you were to implement this with an agentic workflow,
0:44
you might do so like this. You write input an invoice, then call a PDF to text conversion API
0:51
to turn the PDF into maybe formatted text, such as markdown text for the LLM to ingest. Then the LLM
0:57
will look at the PDF and figure out, is this actually an invoice or is this some other type
1:02
of document that they should just ignore? And if it is an invoice, then it will pull up the required
1:07
fields as well as use an API or use a tool to update the database in order to save the most
1:14
important fields in the database records. So one aspect of this agentic workflow is that
1:20
there is a clear process to follow, is identify the required fields and record in the database.
1:26
Tasks like these with a clear process you want followed tend to be maybe easier for agentic
1:31
workflows to carry out because it leads to a relatively step-by-step way to reliably carry
1:36
out this task. Here's another example, maybe just a little bit harder. So if you want to build an
1:41
agent to respond to basic customer order inquiries, then the steps might be to extract the key
1:48
information, so figure out what exactly did the customer order, what's the customer's name, then
1:53
look up the relevant customer records, and then finally draft a response for human to review before
2:00
the email response is sent to the customer. So again, there's a clear process here and we
2:04
will implement this step-by-step, where we take the email, feed it to an LLM to verify or to extract
2:10
the order details, and assuming the customer email is about an order, the LLM might then choose to
2:16
call an order's database to then pull up that information. That information then goes to the LLM
2:22
to then draft an email response, and the LLM might choose to use a request review tool that, say, puts
2:29
this draft email from the LLM into queue for humans to review, so they can then be sent out after a
2:34
human has reviewed and approved it. So customer order inquiry agents like these are being built
2:40
and deployed in many businesses today. To look at a more challenging example, if you want to build a
2:45
customer service agent to respond not just to questions about an order they place, but to respond
2:51
to a more general set of questions, anything a customer may ask, and maybe the customer will ask,
2:57
do you have any black jeans or blue jeans? And to answer this question, you need to maybe make
3:03
multiple API calls to your database to first check the inventory for black jeans, then check inventory
3:08
for blue jeans, and then respond to the customer. So this is an example of a more challenging query,
3:14
where given a user input, you actually have to plan out what is the sequence of database queries
3:19
to check for inventory. Or if a user asks, I'd like to return the beach towel I bought, then to answer
3:25
this, maybe we need to verify that the customer actually bought a beach towel, and then double
3:31
check the return policy. Maybe our set returns only 30 days within the date of purchase, and only the
3:36
towel was unused. And if return is allowed, then have the agent issue a return packing slip, and
3:42
also set the database record to return pending. So in this example, if the required steps to process
3:48
the customer requests are not known ahead of time, then it results in a more challenging process,
3:54
where the LLM base application has to decide for itself that these are the three steps needed in
4:00
order to respond appropriately to this task. But you learn about some of the latest work on how to
4:06
approach this type of problem too. And to give one last example of maybe an especially difficult
4:12
type of agent to build, there's a lot of work on computer use by agents, in which agents will
4:17
attempt to use a web browser and read a web page to figure out how to carry out a complex task.
4:24
In this example, I've asked an agent to check whether seats are available on two specific United
4:30
Airlines flights from San Francisco to Washington DC, or the DCA airport. The agent has access to
4:36
a web browser they can use to carry out this task. And in the video here, you can see it navigating
4:41
the United website independently, clicking on page elements and filling in the text fields on the page
4:46
to carry out the search that I requested. As it works, the agent reasons over the content of the
4:52
page to figure out the actions it needs to take to complete the task, and what it should do next.
4:57
In this case, there's some trouble checking flights on the United site, and instead decides to
5:02
navigate to the Google Flights website to search for available flights. On the Google Flight, you
5:08
see here it finds several flight options that match the user's query, and the agent then picks one and
5:13
is taken back to the United website, where it looks like it's now on the correct web page, and so is
5:20
able to determine that yes, there are seats available on the flights that I asked about. So computer use
5:26
is an exciting cutting-edge area of research right now, and many companies are trying to get computer
5:30
use agents to work. While the agent you saw here did eventually figure out the answer, I often see
5:36
agents having trouble using web browsers well. For example, if a web page is slow to load, an agent
5:42
may fail to understand what's going on, and many web pages are still beyond agents' abilities to
5:47
pause or to read accurately. But I think computer use agents, even though not yet reliable enough
5:53
to use mission-critical applications today, are an exciting and important area of future development.
5:58
So when I'm considering building Agentic AI workflows, the tasks that are easier will tend to be ones
6:05
where there is a clear step-by-step process, or if a business already has a standard procedure, a
6:11
standard offering procedure to follow, and then it can be quite a lot of work to take that procedure
6:15
and codify it up in an AI agent, but that tends to lead to easier implementations. One thing that
6:21
makes it easier is if you are using text-only assets, because LLM/language models have
6:27
grown up really processing text, and if you need to process other input modalities, it may well be
6:33
doable, but it maybe gets a little bit harder. And on the harder end of the spectrum, if the steps are
6:38
not known ahead of time of what's needed to carry out a task, like you saw for the more advanced
6:42
customer service agent, then the agent may need to plan or solve as you go, and this tends to be
6:47
harder and more unpredictable and less reliable. And then as mentioned, if it needs to accept rich
6:52
multi-modal inputs such as sound, vision, audio, that also tends to be less reliable than the
6:58
only header process text. So I hope that gives you a sense of the types of applications you might build
7:05
with agentic workflows. When implementing one of these things yourself, one of the most important
7:10
skills is to look at a complex workflow and figure out what are the individual steps so you can
7:16
implement an agentic workflow to execute those steps one at a time. In the next video, we'll talk
7:22
about task decomposition, that is, given a complex thing you want to do, like write a research report
7:28
or have a customer agent get back to customers, how do you break that down into discrete steps
7:34
to try to implement an agentic workflow? Let's go see that in the next video.

1.5 Task decomposition: Identifying the steps in a workflow

0:00
People and businesses do a lot of stuff. How do you take this useful stuff that we do
0:06
and break it down into discrete steps for the agentic workflow to follow? Let's take a look.
0:12
Take the example of building a research agent. If you want an AI system to write an essay on
0:18
a topic X, one thing you could do is prompt an LLM to have it generate an output directly.
0:24
But if you were to do this for topics that you want deeply researched,
0:28
you may find that the LLM output covers only the surface level points, or maybe covers only the
0:33
obvious facts, but doesn't go as deep into the subject as you want it to. In this case, you
0:39
might then reflect on how you as a human would write an essay on a certain topic. Would you just
0:45
sit down and start writing, or would you take multiple steps, such as first write an essay
0:50
outline, and then search the rep, and then based on the input from the web search, write the essay.
0:55
As I take a task and decompose it into steps, one question I'm always asking myself is,
1:00
if I look at these steps one, two, and three, can each of them be done either by an LLM, or
1:07
by a short piece of code, or by a function call, or by a tool. In this case, I think an LLM can
1:13
maybe write a decent outline on many topics that I would want it to help me think through. So,
1:20
say probably okay on the first step, and then I know how to use an LLM to generate search terms
1:26
to search the web. So, I would say the second step is also doable, and then based on web search,
1:31
I think an LLM could input the web search results and write an essay. And so, this would be a
1:35
reasonable first attempt at an agentic workflow for writing an essay that goes deeper than just
1:41
direct generation. But if I were to then implement this agentic workflow and look at the results,
1:47
maybe you find that the results still aren't good enough. It's still not yet as deeply thoughtful.
1:51
Maybe the essays feel a little bit disjointed. This has actually happened to me. I once built
1:56
a research agent using this workflow, but when I read the output, it felt a bit disjointed.
2:00
You know, the start of the article didn't feel completely consistent with the middle,
2:04
didn't feel completely consistent with the end. In this case, what you might do is then reflect
2:09
on how you would change the workflow if you as a human found that the essay is a little bit
2:13
disjointed. One thing you could do is take the third step and further decompose, write the essay
2:21
into additional steps. So, instead of writing the essay on one go, you might instead have it
2:27
write the first draft, and then consider what parts need revision, and then revise the draft.
2:32
And this would be how I as a human might go about it, to not just write the final essay at my first
2:39
attempt, but write the first draft and then read over it, which is another step that the LLM is
2:44
pretty decent at. And then based on my own critique of my own essay, I'll revise the draft.
2:49
So to recap, I started off with direct generation, just one step, decided it wasn't good enough,
2:55
and so broke that down into three steps, and then maybe decided that still isn't good enough,
3:00
and took one of the steps and further broken it down or decomposed it into three more steps,
3:05
resulting in this more complex, richer process for generating an essay. And depending on how
3:12
satisfied you are with the results of this process, you may choose to even modify this essay
3:18
generation process further. Let's look at the second example of how to decompose complex tasks
3:23
into smaller steps. Take the example of responding to basic customer order inquiries. The first step
3:30
that a human customer specialization might carry out might be to first extract the key information,
3:36
such as who is this email from, what did they order, and what is the order number. And these
3:41
are things that an LLM could do. So I could just say, let's have an LLM do that. The second step
3:46
would be to then find the relevant customer records. So to write and generate the relevant
3:51
database queries to pull up the order of what the customer had ordered and when I shipped and so on.
3:56
I think an LLM with the ability to call a function to query the orders database should be able to do
4:04
that. And lastly, having pulled up the customer record or the customer order record, I might then
4:09
write and send a response back to the customer. And I think with the information we pulled up,
4:13
this third step is also doable with an LLM if I give the option to call an API to send an email.
4:19
So this would be another example of taking a task of responding to customer email and breaking it
4:25
down into three individual steps where I can look at each of these steps and say, yep, I think an LLM
4:30
or one LLM with the ability to call a function to query a database or send an email should be able
4:35
to do that. Just one last example for the invoice processing. After a PDF invoice has been converted
4:42
to text, the first step is to pull out the required information, the name of the biller, the address,
4:47
the due date, the amount due, and so on. And now I should be able to do that. And then if I want to
4:53
check that the information was extracted and save it in a new database entry, then I think an LLM
4:58
should be able to help me call a function to update the database record. And so to implement this,
5:04
we implement an agentic workflow to carry out basically these two steps. When building agentic
5:09
workflows, I think of myself as having a number of building blocks. One important building block
5:15
would be large language models or maybe large multimodal models if I want to try to process
5:20
images or audio as well. And LLMs are good at generating text, deciding what to call,
5:25
maybe extracting information. For some highly specialized tasks, I might also use some other
5:31
AI models, such as an AI model for converting a PDF to text or for text-to-speech or for image
5:38
analysis. In addition to AI models, I also have access to a number of software tools, including
5:45
different APIs that I can call to do voice search, to get maybe real-time weather data,
5:51
to send emails, check calendar, and so on. And I might also have tools to retrieve information,
5:57
to pull up data from a database, or to invent RAG or retrieval augmented generation,
6:03
where I can look up a large text database and find the most relevant text. Or I might also
6:07
have tools to execute code. And this is a tool that lets an LLM write code and then run the code
6:13
on your computer to do a huge range of things. In case some of these tools seem a bit foreign to
6:18
you, don't worry about it. We'll go through the most important tools in much greater detail in
6:23
a later module. But I think of a lot of my work when I'm building an agent workflow as looking at
6:29
the work that the person or business is doing and then trying to figure out with these building
6:33
blocks, how can I sequence these building blocks together in order to carry out the tasks that I
6:39
want my system to carry out. And this is why having a good understanding of what building
6:44
blocks are available, which I hope you have a better sense of by the end of this course as well,
6:49
will allow you to better envision what agentic workflows you can build by combining these
6:54
building blocks together. So to summarize, one of the key skills in building agentic workflows is
6:59
to look at a bunch of stuff that maybe someone does and to identify the discrete steps that
7:06
it could be implemented with. And when I'm looking at the individual discrete steps,
7:10
one question I'm always asking myself is, can this step be implemented with either an
7:16
LLM or with one of the tools such as an API or a function call that I have access to? And in case
7:22
the answer is no, I'll then often ask myself, how would I as a human do this step? And is it
7:28
possible to decompose this further or break this down into even smaller steps that then maybe is
7:34
more amenable to implementation with an LLM or with one of the software tools that I have?
7:40
So I hope this gives you a rough sense of how to think about task decomposition. In case you feel
7:45
like you don't fully have it yet, don't worry about it. We'll go through many more examples
7:50
in this course and you have a much better understanding of this by the end of this
7:54
course. But it turns out that as you build agentic workflows, you find that often you
7:59
build an initial task decomposition, initial agentic workflow, and then you want to keep
8:05
on iterating and improving on it quite a few times until it delivers the level of performance
8:10
that you want. And to drive this improvement process, which I found important for many
8:15
projects, one of the key skills is to know how to evaluate your agentic workflow. So in the next
8:21
video, we'll talk about evaluations or evals and discrete key components, how you can build,
8:28
and then also keep on improving your workflows to get the performance
8:31
that you want. Let's talk about evals in the next video.

1.6 Evaluations agentic (evals)

0:04
I've worked with many different teams on building agentic workflows, and I've found that one
0:05
of the biggest predictors for whether someone is able to do it really well versus be less
0:10
efficient at it is whether or not they're able to drive a really disciplined evaluation
0:16
process.
0:17
So, your ability to drive evals for your agentic workflow makes a huge difference in your ability
0:23
to build them effectively.
0:26
In this video, we'll take a quick overview of how to build evals, and this is a subject
0:30
that we'll actually go into much deeper in a later module in this course.
0:35
So, let's take a look.
0:37
After building an agentic workflow like this one for responding to customer order inquiries,
0:43
it turns out that it's very difficult to know in advance what are the things that could
0:47
go wrong.
0:48
And so, rather than trying to build evaluations in advance, what I recommend is you just look
0:53
for the outputs and manually look for things that you wish it was doing better.
1:00
For example, maybe you read a lot of outputs and find that it is unexpectedly mentioning
1:05
your competitors more than it should.
1:08
Many businesses don't want their agents to mention competitors because it just creates
1:12
an awkward situation.
1:14
And if you read some of these outputs, maybe you find that it sometimes says, I'm glad
1:17
you shopped with us.
1:18
We're much better than our competitor, ComproCo.
1:21
Or maybe sometimes they say, sure, it should be fun.
1:23
Unlike RivalCo, we make returns easy.
1:25
And you may look at this and go, gee, I really don't want this to mention competitors.
1:30
This is an example of a problem that is really hard to anticipate in advance of building
1:36
this agentic workflow.
1:37
So, the best practice is really to build it first and then examine it to figure out where
1:42
it is not yet satisfactory, and then to find ways to evaluate as well as improve the system
1:47
to eliminate the ways that it is still not yet satisfactory.
1:51
Assuming your business considers it an error or a mistake to mention competitors
1:57
in this way, then as you work on eliminating these competitor mentions, one way to track
2:02
progress is to add an evaluation or an eval to track how often this error occurs.
2:08
So, if you have a named list of competitors like ComproCo, RivalCo, the other co, then
2:14
you can actually write code to just search in your own output for how often it mentions
2:20
these competitors by name and count up as a number, as a fraction of the overall responses,
2:26
how frequently it mistakenly mentions competitors.
2:29
One nice thing about the problem of competitor mentions is it's an objective metric, meaning
2:35
either the competitor was mentioned or not.
2:38
And for objective criteria, you can write code to check for how often this specific
2:44
error occurs.
2:46
But because LLMs output free text, there are also going to be criteria by which you want
2:51
to evaluate this output that may be more subjective and where it's harder to just write code
2:57
to output a black and white score.
2:59
In this case, using a LLM as a judge is a common technique to evaluate the output.
3:05
So, for example, if you're building a research agent to do research on different topics,
3:10
then you can use another LLM and prompt it to maybe, say, assign the following essay
3:16
a quality score between 1 and 5, where 1 is the worst and 5 is the best essay.
3:21
Here, I'm using a Python expression to mean copy-paste the generated essay into this.
3:27
So, you can prompt the LLM to read the essay and assign it a quality score.
3:32
Then I'm going to ask the research agent to write a number of different research reports,
3:37
for example, on recent developments in black hole science or using robots to harvest fruit.
3:43
And then in this example, maybe the judge LLM assigns the essay on black holes a score
3:48
of 3, the essay on robot harvesting a score of 4, and as you work on improving your research
3:54
agent, hopefully you see these scores go up over time.
3:58
It turns out, by the way, that LLMs are actually not that good at these 1 to 5 scale ratings.
4:03
You can give it a shot, but I personally tend not to use this technique that much myself.
4:08
But in a later module, you'll learn some better techniques to have an LLM output more accurate
4:13
scores than asking it to output scores on a 1 to 5 scale, although some people will
4:17
do this, maybe an initial cut as an LLM as judge type of eval.
4:22
Just to give a preview of some of the Agentic AI evals you'll learn about later in this course,
4:28
you've already heard me talk about how you can write codes to evaluate objective criteria,
4:33
such as did it mention a competitor or not, or use an LLM as a judge for more subjective
4:37
criteria such as what's the quality of this essay.
4:39
But later, you learn about two major types of evals.
4:42
One is end-to-end, where you measure the output quality of the entire agent, as well as component
4:48
level evals, where you might measure the quality of the output of a single step in the agentic
4:53
workflow.
4:54
It turns out that these are useful for driving different parts of your development process.
4:58
One thing I do a lot as well is just examine the intermediate outputs, or sometimes we
5:03
call these the traces of the LLM, in order to understand where it is falling short of
5:09
my expectations.
5:10
And we call this error analysis, where we just read through the intermediate outputs
5:14
of every single step to try to spot opportunities for improvement.
5:17
And it turns out being able to do evals and error analysis is a really key skill.
5:22
So we have much more to say about this in the fourth module in this course.
5:27
We're nearly to the end of this first module.
5:29
Before moving on, I just want to share with you what I think are the most important design
5:33
patterns for building agentic workflows.
5:35
Let's go take a look at that in the next video.

1.7 Agentic design patterns

0:04
We build agentic workflows by taking building blocks and putting them together to sequence
0:04
out these complex workflows. In this video, I'd like to share with you a few of the key design
0:10
patterns, which are patterns for how you can think about combining these building blocks into more
0:16
complex workflows. Let's take a look. I think four key design patterns for building agentic
0:21
workflows are reflection, two-use, planning, and multi-agent collaboration. Let me briefly go over
0:27
what they mean, and then we'll actually go through most of these in-depth latent discourse as well.
0:33
The first of the major design patterns is reflection. So I might go to an LLM agent and
0:39
ask it to write code, and it turns out that an LLM might then generate code like this. It defines
0:45
here a Python function to do a certain task. I could then construct a prompt that looks like this.
0:50
I can say, here's code intended for a certain task, and then copy-paste whatever the LLM had
0:55
just output back into this prompt. And then I ask it to check the code carefully for correctness,
0:59
style, and efficiency, and give constructive criticism. And it turns out that the same LLM model
1:03
prompted this way may be able to point out some problems with the code. And if I then take this
1:09
critique and feed it back to the model to say, looks like this is a bug, could you change the
1:15
code to fix it? Then it may actually come with a better version of the code. To give a preview of
1:21
tool use, if you're able to run the code and see where the code fails, then feeding that back to
1:28
the LLM can also cause it to be able to iterate and generate a much better, say, v3 version 3 of the
1:35
code. So reflection is a common design pattern where you can ask the LLM to examine its own outputs
1:41
or maybe bring in some external sources of information, such as run the code and see if it
1:46
generates any error messages, and use that as feedback to iterate again and come up with a better
1:52
version of its output. And this design pattern isn't magic. It does not result in everything
1:58
working 100% of the time. But sometimes it can be a nice bump in the performance of your system.
2:03
Now, I've drawn this as if it was a single LLM that I'm prompting, but to foreshadow multi-agent
2:09
workflows, you can also imagine instead of having the same model critique itself, you can imagine
2:15
having a critique agent. And all that is, is an LLM that's been prompted with instructions like,
2:20
your role is to critique code, here's code intended for a task, check the code carefully, and so on.
2:25
And the second critique agent, maybe point out errors or run unit tests. And by having two
2:31
simulated agents where each agent is just an LLM prompted to take on a certain persona,
2:36
you can have them go back and forth to iterate to get a better output. In addition to reflection
2:42
pattern, the second important design pattern is tool use. Where today, LLMs can be given tools,
2:49
meaning functions that they can call in order to get work done. For example, if you ask an LLM,
2:54
what's the best coffee maker according to reviewers, and you give it a web search tool,
2:58
then it can actually search the internet to find much better answers. Or a code execution tool.
3:03
If you ask a math question like, if I invest $100 in compound interest, what do I have at the end?
3:08
It can then write code and execute code to compute an answer. Today, different developers
3:13
have given LLMs many different tools for everything from math or data analysis to gather information
3:19
by fetching things from the web or for various databases, to interface with productivity apps
3:24
like email, calendar, and so on, as well as to process images and much more. And the ability
3:30
of an LLM to decide what tools to use, meaning what functions to call, that lets the model get a lot
3:37
more done. The third of the four design patterns is planning. This is an example from a paper
3:43
called Hugging GPT, in which if you ask a system to please generate an image where a girl is reading
3:51
a book and a pose is the same as a boy in the image, then please describe the new image in
3:54
your voice. Then a model can automatically decide that to carry out this task, it first needs to
4:00
find a pose determination model to figure out the pose of the boy. Then to pose the image,
4:05
to generate a picture of a girl and image the text, and then finally text the speech. And so
4:10
in planning, an LLM decides what is the sequence of actions it needs to take. In this case, it is
4:17
a sequence of API calls so that it can then carry out the right sequence of steps in the right order
4:23
in order to carry out the task. So rather than the developer hard coding the sequence of steps
4:29
in advance, this actually lets the LLM decide what are the steps to take. Agents that plan today are
4:36
harder to control and somewhat more experimental, but sometimes they can give really delightful
4:40
results. And then finally, multi-agent workflows. Just as a human manager might hire a number of
4:47
others to work together on a complex project, in some cases it might make sense for you to hire
4:52
a set of multiple agents, maybe each of which specializes in a different role, and have them
4:58
work together to accomplish a complex task. The picture you see here on the left is taken from a
5:03
project called ChatDev, which is a software framework created by Chen Qian and collaborators.
5:09
In ChatDev, multiple agents with different roles, like chief executive officer, programmer, tester,
5:15
designer, and so on, collaborate together as if they were a virtual software company and can
5:22
collaboratively complete a range of software development tasks. Let's consider another example.
5:28
If you want to write a marketing brochure, maybe you think of hiring a team of three people, such
5:33
as a researcher to do online research, a marketer to write the marketing text, and then finally an
5:39
editor to edit and polish the text. And so in a similar way, you might consider building a multi-agent
5:47
workflow in which you have a simulated research agent, a simulated marketer agent, and a simulated
5:53
editor agent that then come together to carry out this task for you. Multi-agent workflows are
6:00
more difficult to control since you don't always know ahead of time what the agents will do, but
6:05
research has shown that they can result in better outcomes for many complex tasks, including things
6:11
like writing biographies or deciding on chess moves to make in the game. You learn more about
6:16
multi-agent workflows later in this course as well. And so with that, I hope you have a sense of what
6:21
agentic workflows can do, as well as of what are the key challenges of finding building blocks and
6:27
putting them together, maybe via these design patterns, in order to implement an agentic workflow.
6:33
And of course, also developing eval so you can see how well your system is doing and keep on improving
6:39
on it. In the next module, I'd like to share with you a deep dive into the first of these design
6:46
patterns, that is reflection, and you find that it's a maybe surprisingly simple to implement
6:53
technique that can give the performance of your system sometimes a very nice bump. So let's go on
6:59
to the next module to learn about the reflection design pattern.