Module 3: Tool Use

3.1 What are tools?

0:00
In this module, you learn about tool use by LLMs, and that means letting your LLM decide when it might want to request to have a function called to take some action, or gather some information, or do something else.
0:12
Just as we as humans can do a lot more with tools than we can with just our bare hands, LLMs too can also do a lot more with access to tools.
0:23
But rather than using hammers and spanners and pliers, when we give tools, that is, functions, for an LLM to request a call, that's what lets it do a lot more.
0:33
Let's take a look.
0:34
If you were to ask an LLM that's been trained maybe many months ago, what time is it right now?
0:40
Well, that trained model does not know exactly what time it is, and so hopefully it responds, sorry, I do not have access to the current time.
0:48
But if you were to write a function and give the LLM access to this function, then that lets it respond with a more useful answer.
0:56
When we let LLMs call functions, or more precisely, let an LLM request to call functions, that's what we mean by tool use, and the tools are just functions that we provide to the LLM that it can request to call.
1:08
In detail, this is how tool use works.
1:11
In this example, I'm going to give the getCurrentTime function that I showed on the previous slide to the LLM.
1:18
When you then prompt it, what time is it, the LLM can decide to call the getCurrentTime function.
1:23
That will return the current time, which is then fed back to the LLM in the conversational history, and finally the LLM can output is, say, 3.20pm.
1:33
So the sequence of steps is, there's the input prompt.
1:36
The LLM, in this case, looks at the set of tools, which is just one tool in this example, but looks at the set of tools available, and it will decide in this example to call the tool.
1:47
The tool is a function that then returns a value, that value is fed back to the LLM, and then finally the LLM generates its output.
1:54
Now, one important aspect of tool use is, we can leave it up to the LLM to decide whether or not to use any of the tools.
2:03
So for the same setup, if I was asking it, how much caffeine is in green tea, the LLM doesn't need to know the current time to answer this, and so it can generate an answer directly,
2:12
green tea typically has this much caffeine, and it does so without invoking the getCurrentTime function.
2:18
In my slides, I'm going to use this notation with this dashed box on top of the LLM to indicate that we're providing a set of tools to the LLM for the LLM to choose to use when it deems appropriate.
2:30
This is as opposed to some examples you saw in earlier videos, where I, as a developer, had hard-coded in, for example, that I will always do a web search at this point in the research agent.
2:41
In contrast, the getCurrentTime function call is not hard-coded in, it's up to the LLM to decide whether or not it wants to request a call to the getCurrentTime function.
2:51
And again, we're going to use this dashed box notation to indicate when we're giving one or more tools to the LLM for the LLM to decide what tools, if any, it wants to call.
3:02
Here are some more examples of when tool use may help an LLM-based app generate better answers.
3:08
If you were to ask it, can you find some Italian restaurants near Mountain View, California?
3:12
If it has a web search tool, then an LLM might elect to call a web search engine for a query, restaurants near Mountain View, California, and use the results that fetches to generate the output.
3:23
Or if you are running a retail store and you want to be able to answer questions like, show me customers who bought white sunglasses, if your LLM is given access to a query database tool, then it might look up the table of sales for what entries had a pair of white sunglasses sold and then use that to then generate the output.
3:44
Finally, if you wanted to do an interest rate calculation, if I were to deposit $500 and after 10 years, an interest rate of 5%, what would I have?
3:53
If you happen to have an interest calculation tool, then it could invoke the interest calculation function to calculate that.
4:01
Or it turns out, one thing you see later is letting an LLM write code, like just write a mathematical expression like this, and then to evaluate it, that would be another way to let an LLM calculate the right answer.
4:16
So as a developer, it'll be up to you to think through what are the sorts of things you want an application to really do, and then to create the functions or the tools that are needed to make them available to the LLM to let it use the appropriate tools to complete the sorts of tasks that maybe a restaurant recommender or a retail question answer or a finance assistant may want to do.
4:39
So depending on your application, you may have to implement and make different tools available to your LLM.
4:46
So far, most of the examples we've gone through made only one tool or one function available to the LLM.
4:52
But there are many use cases where you want to make multiple tools or multiple functions available for the LLM for it to choose which of any to call.
4:59
For example, if you're building a calendar assistant agent, you might then want it to be able to fulfill requests like, please find a free slot on Thursday in my calendar and make an appointment with Alice.
5:11
So in this example, we might make available to the LLM a tool or a function to make an appointment, that is, to send a calendar invite, to check the calendar to see when I might be free, as well as to delete the appointment if it ever wants to cancel an existing calendar entry.
5:26
And so given the set of instructions, the LLM would first decide that of the different tools available, probably the first one it should use is check calendar.
5:35
So call a check calendar function that will return when I am free on Thursday.
5:40
Based on that information, which is fed back to the LLM, it can then decide that the next step is to pick a slot, let's say 3 p.m., and then to call the make appointment function to send a calendar invite to Alice, as well as to add it to my calendar.
5:56
The output of that, which hopefully is a confirmation that the calendar entry was sent out successfully, is fed back to the LLM, and then lastly, the LLM might tell me your appointment is set up with Alice at 3 p.m. Thursday.
6:08
Being able to give your LLM access to tools is a pretty big deal. It will make your applications much more powerful.
6:15
In the next video, we'll take a look at how to write functions, how to create tools to then make them available to your LLM. Let's go on to the next video.

3.2 Creating a tool

0:01
The process of how an LLM decides to call a function maybe seems a little bit mysterious
0:05
initially because an LLM is just trained to generate output text or output text tokens.
0:11
So how does that work? In this video, I'd like to walk through with you step-by-step
0:15
what the process of getting an LLM to be able to get a function called really looks like.
0:21
Let's take a look.
0:22
So tools are just codes or functions that an LLM can request to be executed,
0:27
like this getCurrentTime function that we saw from the previous video.
0:31
Now, today's leading LLMs are all trained directly to use tools,
0:36
but I want to walk through with you what it would look like if you had to write prompts yourself
0:41
to tell it when to use tools, and this is what we had to do in an earlier era
0:46
before LLMs were trained directly to use tools.
0:49
And even though we don't do it exactly this way anymore,
0:51
this will hopefully give you a better understanding of the process,
0:54
and we'll walk through the more modern syntax in the next video.
0:57
If you've implemented this function to getCurrentTime,
1:00
then in order to give this tool to the LLM, you might write a prompt like this.
1:05
You may tell it, LLM, you have access to a tool called getCurrentTime.
1:09
To use it, I want you to print out the following text.
1:12
Print out all caps function and then print out getCurrentTime.
1:15
And if I ever see this text, all caps function and then getCurrentTime,
1:19
that's when I know you want me to call the getCurrentTime function for you.
1:23
When a user asks, what time is it?
1:25
The LLM will then realize it needs to call or request to get called the getCurrentTime function.
1:29
And so the LLM will then output what it was told.
1:32
It'll output all caps function: getCurrentTime.
1:35
Now, I then have to have written code to look at the output of the LLM
1:40
to see if there is this all caps function.
1:42
And if so, then I need to pull out the argument of this getCurrentTime
1:47
to figure out what function the LLM wants to call.
1:49
And then I need to write code to actually call the getCurrentTime function
1:53
and then pull out the output, which is, let's say, 8 a.m.
1:57
And then it is the developer written code, my code,
2:00
that has to take 8 a.m. and feed that time, 8 a.m.,
2:04
back into the LLM as part of this conversational history.
2:07
And the conversational history, of course, includes the initial user prompt,
2:10
the fact that the request is a function call, and so on.
2:13
And lastly, the LLM, knowing what had happened earlier,
2:17
that the user asks a question, requests a function call,
2:19
and then also that I call the function and return 8 a.m.
2:23
Finally, the LLM can look at all this and generate the final response,
2:26
which is, it is 8 a.m.
2:28
So to be clear, in order to call a function,
2:31
the LLM doesn't call the function directly.
2:34
It instead outputs something in a specific format like this
2:38
that tells me that I need to call the function for the LLM
2:41
and then tell the LLM what was the output of the function I requested.
2:45
In this example, we had given the LLM only a single function,
2:49
but you can imagine if we gave it three or four functions,
2:52
we could tell it to output functions in all caps,
2:55
then the name of the function it wants called,
2:57
and maybe even some arguments of these functions.
3:00
In fact, now let's take a look at a slightly more complex example
3:03
where the getCurrentTime function accepts an argument for the time zone
3:08
at which you want the current time.
3:10
For this second example, I've written a function
3:14
that gets the current time in a specified time zone,
3:16
where here the time zone is the input argument
3:19
to the getCurrentTime function.
3:22
So to let the LLM use this tool to answer questions
3:25
like maybe, what time is it in New Zealand?
3:27
Because my answer is there, so before I call her up,
3:29
I do look up what time it is in New Zealand.
3:31
To let the LLM use this tool, you might modify the system prompt
3:35
to say you can use the getCurrentTime tool for a specific time zone.
3:39
To use it, I'll put the following, getCurrentTime,
3:41
and then, you know, include the time zone.
3:43
And this is an abbreviated prompt.
3:45
In practice, you might put more details than this into the prompt
3:48
to tell it what is the function, how to use it, and so on.
3:50
In this example, the LLM will then realize
3:53
it needs to fetch the time in New Zealand,
3:56
and so it will generate output like this,
3:58
function: getCurrentTime Pacific/Auckland.
4:02
This is the New Zealand time zone
4:04
because Auckland is a major city in New Zealand.
4:06
Then I have to write code to search for whether or not
4:10
this function all caps appeared in the LLM output,
4:13
and if so, then I need to pull out the function to call.
4:17
Lastly, I will then call getCurrentTime
4:19
with the specified arguments, which is generated by the LLM,
4:22
which is Pacific/Auckland, and maybe returns is 4 a.m.
4:25
Then as usual, I feed this to the LLM and the LLM outputs.
4:28
It is 4 a.m. in New Zealand.
4:30
To summarize, here's the process for getting LLM to use tools.
4:34
First, you have to provide the tool to the LLM,
4:37
implement the function, and then tell the LLM that it is available.
4:40
When the LLM decides to call a tool,
4:42
it then generates a specific output that lets you know
4:45
that you need to call the function for the LLM.
4:48
Then you call the function, get its output,
4:51
take the output of the function you just called,
4:53
and give that output back to the LLM,
4:55
and the LLM then uses that to go on to whatever it decides to do next,
4:59
which in our examples in this video was to just generate the final output,
5:03
but sometimes it may even decide that the next step
5:05
is to go call yet another tool, and the process continues.
5:09
Now, it turns out that this all-caps function syntax is a little bit clunky.
5:13
This is what we used to do before LLMs were trained natively
5:17
or to know by themselves how to request that tools be called.
5:21
With modern LLMs, you don't need to tell it to output all-caps function,
5:25
then search for all-caps function, and so on.
5:27
Instead, LLMs are trained to use a specific syntax
5:31
to request very clearly when it wants a tool called.
5:34
In the next video, I want to share with you
5:36
what the modern syntax actually looks like
5:38
for letting LLMs request to have tools be called.
5:42
Let's go on to the next video.

3.3 Tool syntax

0:01
Let's take a look at how to write code to have your LLM get tools called.
0:04
Here's our old getCurrentTime function without the time zone argument.
0:09
Let me show you how to use the AI Suite open source library in order to have your LLM call
0:14
tools. By the way, technically, as you saw from the last video, the LLM doesn't call the tool.
0:20
The LLM just requests that you call the tool. But among developers building agentic workflows,
0:25
many of us will occasionally just say the LLM calls the tool, even though it's not technically
0:30
what happens, but because it's just a shorter way to say it.
0:34
This syntax here is very similar to the OpenAI syntax for calling these LLMs, except that here,
0:41
I'm using the AI Suite library, which is an open source package that some friends and I had worked
0:46
on that makes it easy to call multiple LLM providers. So the code syntax, and if this
0:53
looks like a lot to you, don't worry about it. You'll see more of this in the code labs.
0:57
But very briefly, this is very similar to the OpenAI syntax, where you say response equals
1:02
client check, completions create, then select the model, which in this case, we'll use the
1:07
OpenAI model GPT-4o, messages equals messages, assuming you've put into an array here the
1:13
messages you want to pass the LLM, and it will say tools equals, then a list of the tools you want
1:19
the LLM to have access to. And in this case, there's just one tool, which is get current time,
1:23
and then don't worry too much about the max turns parameter. This is included because
1:28
after a tool call returns, the LLM might decide to call another tool, and after that tool call
1:33
returns, the LLM might decide to call yet another tool. So max turns is just a ceiling on how many
1:38
times you want the LLM to request one tool after another before you stop to just break out of a
1:44
possible infinite loop. In practice, you almost never hit this limit unless your code is doing
1:49
something unusually ambitious. So I wouldn't worry about the max turns parameter. I usually just set it to
1:54
five, but in practice, it doesn't matter that much. And it turns out that with AISuite, the function
2:00
get current time is automatically described to the LLM in an appropriate way to enable the LLM to
2:06
know when to call it. So rather than you needing to manually write a long prompt to tell the LLM,
2:12
once get current time, this syntax in AISuite does that automatically. And to make it seem not
2:18
too mysterious, the way it does that, it actually looks at the dot string associated with get current
2:23
time with this comments in get current time in order to figure out how to describe this function
2:30
to the LLM. So to illustrate how this works, here's the function again, and here's the
2:36
snippet of code using AISuite to call the LLM. Behind the scenes, what this will do is create a
2:43
JSON schema that describes the function in detail. And this over here on the right is what is actually
2:50
passed to the LLM. And specifically, it will pull the name of the function, which is get current time,
2:55
and then also a description of the function, which is pulled out from the doc string to tell the LLM
3:01
what this function does, which lets it decide when to call it. There's some APIs which require that
3:06
you manually construct this JSON schema and then pass this JSON schema to the LLM, but the AISuite
3:13
package does this automatically for you. To go through a slightly more complex example, if you
3:18
have this more complex get current time tool that also has an input time zone parameter, then AISuite
3:25
will create this more complex JSON schema where, as before, it pulls out the name of the function,
3:30
which is get current time, pulls out the description from the doc string, and then also identifies
3:35
what are the parameters and describes them to the LLM based on the documentation here shown on the
3:41
left, so that when it's generating the function arguments to call the tool, it knows that it
3:47
should be something like America/New York or Pacific/Auckland or some other time zone.
3:53
And so if you execute this code snippet here on the lower left, it will use the OpenAI
3:59
GPT-4o model, see if the LLM wants the function called, and if so, it'll call the function,
4:04
get the output from the function, feed that back to the LLM, and do that up to a maximum of five
4:10
turns and then return the response. Note that if the LLM requests to call the get current time
4:17
function, AISuite or this client, it will call the get current time for you, so you don't need
4:23
to explicitly do it yourself. All that is done in this single function call that you have to write.
4:29
Just note that there are some other implementations of LLM interfaces where you have to do that step
4:36
manually, but with this particular package, this is all wrapped into this client chat completions
4:41
create function call. So you now know how to get an LLM to call functions, and I hope that you enjoy
4:49
playing with this in the labs, and it's actually really amazing when you provide a few functions
4:54
to LLM and LLM decides to go and take action in the world, go and get more information
4:59
to fulfill your requests. If you haven't played with this before, I think you'll find this to be
5:04
really cool. It turns out that of all the tools you can give an LLM, there's one that's a bit special,
5:10
which is a code execution tool. It turns out to be really powerful. If you can tell an LLM,
5:16
you can write code, and I will have a tool to execute that code for you, because code can do
5:21
a lot of things, and we give an LLM the flexibility to write code and have code executed. That turns
5:28
out to be an incredibly powerful tool to give to LLMs. So code execution is special. Let's go
5:35
on to the next video to talk about the code execution tool for LLMs.

3.4 Code execution

0:05
In a few agentic applications I've worked on, I gave the LLM the option to write code to then
0:06
carry out the task I wanted it to. And I've been a few times now, I've been really surprised and
0:10
delighted by the cleverness of the code solutions it generated in order to solve various tasks for
0:17
me. So if you haven't used code execution much, I think you might be surprised and delighted at
0:23
what this will let your LLM applications do. Let's take a look. Let's take an example of
0:29
building an application that can input math word problems and solve them for you. So you might
0:36
create tools that add numbers, subtract numbers, multiply numbers, and divide numbers. And if
0:40
someone says, please add 13.2 plus 18.9, then it triggers the add tool and then it gets you the
0:46
right answer. But what if someone now types in, what is the square root of two? Well, one thing
0:50
you could do is write a new tool for a square root, but then maybe some new thing is needed to
0:56
carry out exponentiation. And in fact, if you look at the number of buttons on your modern
1:02
scientific calculator, are you going to create a separate tool for every one of these buttons and
1:06
the many more things that we would want to do in math calculation? So instead of trying to implement
1:12
one tool after another, a different approach is to let it write and execute code. To tell the LLM
1:19
to write code, you might write a prompt like this. Write code to solve the user's query. Return your
1:24
answer as Python code delimited with execute Python and closing execute Python tags. So given a query
1:31
like what is the square root of two, the LLM might generate outputs like this. You can then use
1:37
pattern matching, for example, a regular expression to look for the start and end execute Python tags
1:44
and extract the code in between. So here you get these two lines of code shown in the green box,
1:50
and you can then execute this code for the LLM and get the output, in this case, 1.4142 and so on.
1:57
Lastly, this numerical answer is then passed back to the LLM and it can write a nicely formatted
2:04
answer to the original question. There are a few different ways you can carry out the code
2:08
execution step for the LLM. One is to use Python's exec function. This is a built-in Python function
2:15
which will execute whatever code you pass in. And this is very powerful for your LLM to really
2:21
write code and get you to execute that code, although there are some security implications
2:26
which we'll see later in this video. And then there are also some tools that will let you run the code
2:31
in a safer sandbox environment. And of course, square root of two is a relatively simple example.
2:38
An LLM can also accurately write code to, for example, do interest calculations and solve much harder
2:45
math calculations than this. One refinement to this idea, which you sort of saw in our section
2:52
on reflection, is that if code execution fails, so if for some reason the LLM had generated code
2:58
that wasn't quite correct, then passing that error message back to the LLM to let it reflect and
3:04
maybe revise this code and try another one or two times. That can sometimes also allow it to get a
3:10
more accurate answer. Now, running arbitrary code that an LLM generates does have a small chance of
3:17
causing something bad to happen. Recently, one of my team members was using a highly agentic coder
3:24
and it actually chose to remove star.py within a project directory. So this is actually a real
3:30
example. And eventually that agentic coder did apologize. It said, yes, that's actually right,
3:35
that was an incredibly stupid mistake. I guess I was glad that this agentic coder was really sorry,
3:40
but I already deleted a bunch of Python files. Unfortunately, the team member had it backed
3:44
up on GitHub repo, so there was no real harm done, but it would have been not great if this
3:50
arbitrary code, which made the mistake of deleting a bunch of files, had been executed
3:55
without the backup. So the best practice for code execution is to run it inside a sandbox
4:00
environment. In practice, the risk for any single line of code is not that high. So if I'm being
4:07
candid, many developers will execute code from the LLM without too much checking. But if you want to
4:12
be a bit safer, then the best practice is to create a sandbox so that if an LLM generates bad
4:19
code, there's a lower risk of data loss or leakage of sensitive data and so on. So sandbox
4:26
environments like Docker or E2B as a lightweight sandbox environment can reduce the risk of
4:33
arbitrary codes being executed in a way that damages your system or your environment.
4:39
It turns out that code execution is so important that a lot of trainers of LLMs actually do special
4:46
work to make sure that code execution works well on their applications. But I hope that as you add
4:52
this as one more tool for you to potentially offer to LLMs or let you make your applications
4:58
much more powerful. So far and what we've discussed, you have to create tools and make them
5:05
available one at a time to your LLM. It turns out that many different teams are building similar
5:11
tools and having to do all this work of building functions and making them available to the OMs.
5:18
But there is recently a new standard called MCP, Model Context Protocol, that's making it much
5:24
easier for developers to get access to a huge set of tools for LLMs to use. This is an important
5:31
protocol that more and more teams are using to develop LLM based applications. Let's go learn
5:37
about MCP in the next video.

3.5 MCP

0:00
MCP, the Model Context Protocol, was a standard proposed by Anthropic but now adopted by many
0:07
other companies and by many developers as a way to give an LLM access to more context and to
0:13
more tools. There are a lot of developers developing around the MCP ecosystem and so
0:18
learning about this will give you a lot more access to resources for your applications.
0:24
Let's take a look. This is the pain points that MCP attempts to solve. If one developer is writing
0:31
an application that wants to integrate with data from Slack and Google Drive and GitHub or access
0:37
data from a Postgres database, then they might have to write code to wrap around Slack APIs
0:42
to have functions to provide to the application, write code to wrap around Google Drive APIs to
0:47
parse the application, and similarly for these other tools or data sources. Then what has been
0:54
happening in the developer community is if a different team is building a different application,
0:59
then they too will integrate by themselves with Slack and Google Drive and GitHub and so on.
1:04
So many developers were all building custom wrappers around these types of data sources.
1:10
And so if there are M applications being developed and there are N tools out there,
1:16
the total amount of work done by the community was M times N. What MCP did was propose a standard
1:24
for applications to get access to tools and data sources so that the total work that needs to be
1:29
done by the community is now M plus N rather than M times N. The initial design of MCP focused a lot
1:38
on how to give more context to an LLM or how to fetch data. So a lot of the initial tools were
1:44
ones that would just fetch data. And if you read the MCP documentation, that refers to these as
1:51
resources. But MCP gives access to both data as well as the more general functions that an
1:57
application may want to call. And it turns out that there are many MCP clients. These are the
2:04
applications that want access to tools or to data as well as service, which are often the software
2:11
wrappers that then give access to data in Slack or GitHub or Google Drive or allows you to take
2:16
actions at these different types of resources. So today there's a rapidly growing list of MCP
2:23
clients that consume the tools or the resources as well as MCP service that provide the tools and
2:28
the resources. And I hope that you find it useful to build your own MCP client. Your application
2:35
maybe one day will be an MCP client. And if you want to provide resources to other developers,
2:40
maybe you can build your own MCP server someday. Let me show you a quick example of using an MCP
2:46
client. This is a cloud desktop app and it has been connected to a GitHub MCP server. So when
2:54
I enter this query, summarize the readme.md from the GitHub repo at this URL, this is actually
3:00
an AI suite repo. Then this application, which is an MCP client, uses the GitHub MCP server with
3:07
the request, please get the file readme.md from the repo AI suite from this repo. And then it
3:14
gets this response, which is pretty long. All this is then fed back to the LLMs context and the LLM
3:21
then generates the summary of the markdown file. Now let me enter another request, which is let me
3:27
enter what are the latest pull requests. This in turn causes the LLMs to use the MCP server to make
3:36
a different request, to list the pull request. This is another tool provided by GitHub's MCP
3:42
server. And so it makes this request with repo AI suite, sort, going to update it, list 20, and so on.
3:50
And then it gives this response, which is fed back to the LLM and the LLM then writes this nice
3:55
text summary of the latest pull request for this repo. MCP is an important standard. If you want to
4:01
learn more about it, DeepLearning.ai also has a short course that goes much deeper into just the MCP
4:08
protocol that you can check out after finishing the course, if you're interested. I hope this
4:13
video gives you a brief overview of why it's useful and also why many developers are now building to
4:20
this standard. This brings us to the last video on tool use. And I hope that by giving your own access
4:27
to tools, you build and build agentic applications that are much more powerful. In the next module,
4:34
we'll talk about evaluations and error analysis. It turns out that one of the things I've seen
4:41
that distinguishes people that can execute agentic workflows really well versus teams that are not
4:47
as efficient at it is your ability to drive a disciplined evaluation process. In the next
4:55
set of videos, which I think is maybe the most important module of this entire course,
5:00
I hope to share with you some of the best practices of how to use evals to drive
5:05
development of agentic workflows. Look forward to seeing you in the next module.