How AI Agents Use External Tools to Become Truly Powerful
Introduction
A large language model on its own is remarkably capable — it can summarise text, answer questions, write code, and hold a conversation with impressive fluency. But it has real blind spots. Ask it to do precise arithmetic, generate an image, or fetch today’s news, and it starts to struggle.
So how do modern AI agents handle tasks that go beyond what a language model can do alone? The answer is function calling — one of the most powerful and underappreciated features in the AI world.
Think of Your LLM as an Operating System
Here’s a useful mental model: imagine your LLM not as a single all-knowing tool, but as an operating system — the central coordinator that manages everything happening around it.
Just like your laptop’s operating system doesn’t do everything itself (it relies on apps, browsers, calculators, and other programs), an LLM can delegate tasks to specialised tools when it hits the limits of its own capabilities.
AI researcher Andrej Karpathy has described this beautifully. In his framing:
- The context window acts like RAM — it’s the model’s active working memory
- External tools and APIs act like peripheral devices — things the model can reach out to and use
- File systems and vector databases act like long-term storage — persistent knowledge the model can reference
This reframe changes how you think about AI entirely. The LLM isn’t the whole machine — it’s the operating system running the machine.
What Is Function Calling?
Function calling is simply the ability of an LLM to recognise when it needs help from an external tool and then make a request to that tool, receive the result, and incorporate it into its response.
Here are some everyday examples of what that looks like in practice:
- Calculator — Instead of trying to compute a complex sum itself (and potentially getting it wrong), the model calls a calculator tool and returns the accurate result.
- Image generation — The model can’t draw, but it can call a diffusion model — a separate AI system specialised in generating images, video, or audio — and pass the result back to the user.
- Web browsing — When asked about something recent, the model can reach out to a browser tool to fetch up-to-date information it wouldn’t otherwise have access to.
- Python interpreter — Need a graph plotted or data processed? The model can call a Python environment, run the code, and return the output.
- Third-party services — Tools like Gmail, Google Calendar, CRMs, and more can all be connected, allowing the model to send emails, schedule meetings, or update records on your behalf.
The key insight is simple: if the model can’t do something well, it finds a tool that can.
A Real-World Example
Imagine asking an AI agent: “What is 123 times 76 divided by 5?”
Rather than attempting the calculation itself, a function-calling-enabled model recognises this as a maths task, sends it to a calculator tool, and returns the correct answer — seamlessly and accurately.
This same principle applies at much larger scale. An AI agent managing your business workflows might simultaneously have access to your email, your calendar, a web browser, and a document database — each one a specialised tool it can call upon depending on what the task requires.
Multi-Agent Systems: LLMs Talking to Each Other
Function calling doesn’t stop at connecting LLMs to software tools. It also enables LLMs to communicate with other LLMs.
Think of it like an executive team. A CEO-level model might receive a complex request and delegate portions of it to specialised sub-models — one handling financial analysis, another handling technical decisions, another reviewing legal implications. Each of those models can in turn have their own tools and even their own sub-agents.
This is how modern AI agent frameworks are structured. Rather than one model trying to be good at everything, you build a network of specialised agents that collaborate, each contributing its area of strength.
Not Every LLM Supports Function Calling
It’s worth noting that function calling isn’t a universal feature — it needs to be explicitly built into the model. The good news is that most leading models do support it, including GPT-4, the O3 models from OpenAI, and Llama 3. When evaluating any model for agent-based use, always check the documentation to confirm function calling support.
Key Takeaways
- An LLM works best when treated as an operating system — a coordinator that manages and delegates, rather than a tool expected to do everything itself.
- Function calling allows an LLM to reach out to external tools, APIs, and services to complete tasks beyond its own capabilities.
- Common tools include calculators, image generators, web browsers, Python interpreters, and third-party apps like Gmail or calendar services.
- LLMs can also call other LLMs, enabling the construction of sophisticated multi-agent systems where specialised models collaborate.
- Not all models support function calling — always verify in the documentation before building on top of a model.
Conclusion
Function calling transforms an LLM from a clever text processor into the brain of a genuinely capable AI system. By connecting it to the right tools — whether that’s a calculator, a browser, an image generator, or another AI model entirely — you dramatically expand what’s possible.
As you start building AI agents, this concept will come up again and again. The more tools you connect, the more capable your agent becomes — and understanding why that works makes you a far better builder.
