A Calustra - Eloy Coto Pereiro Home About Books

On AI Agents, MCP, and Tool Selection

on

One of the things that's been worrying me about the current AI landscape is how we handle tools, or MCP, if you prefer. At the end of the day, MCP is essentially a wrapper on top of tools that allows them to be exposed and called by a framework. But here's the thing: when we present tools (or MCP servers) to an LLM, we're basically giving it a bunch of things it can call to get more information. And that's where the problem starts.

The issue is that a lot of users don't really understand what tools mean to an LLM. Let me share an analogy I gave to some friends the other day. Imagine you walk into a new restaurant: a three Michelin-star restaurant. You sit down, and there are four forks on the left and four knives on the right. Then the waiter comes with your food. It's a burger. Now you're sitting there in this fancy three-star establishment with four forks and four knives in front of you. Are you going to use the forks? You don't know. You're not sure if the burger should be eaten by hand or not, because you have all these utensils available. So why are all the forks and knives there if you're just getting a burger?

Google banana auto generate function

This is exactly what happens with MCP tools. When you define a bunch of tools, you're inviting the LLM or AI agent to use them. By giving these tools to the system, you're providing detailed information on how to use them, even if they're not necessarily something that should be used for that particular task. One thing that's made me really happy recently is that over the last month, Anthropic has been changing how they filter and score tools using another agent layer, making the context clearer and narrower. So for me, there are two kinds of agents that are important to understand when thinking about what we want to achieve.

The Global Agent

The first type I'd suggest is what I call the global agent. This could be a chat interface or something where the user understands that the information might not be 100% accurate, it's more about giving hints and direction. In this scenario, you actually want a lot of tools: internet search, a tool for read-only database access, dashboard integration, tools to explore documents on your intranet. At the end, you want access to a wide range of information sources.

This kind of agent will have many tools available because it's designed for exploration. You don't need perfect accuracy here, you want to give users hints about how they can find information. This isn't about narrow, focused tasks; it's about exploration and discovery. Think of it as asking the agent to "figure this out for me" when you're not even sure exactly what you're looking for yet. Yes, this will be more expensive in terms of input tokens, but you're giving users the ability to explore and move forward with some kind of exploratory message and a summary of what's possible.

The Playbook Agent

The second type is what I call the playbook agent. This is a single-task agent that you need to repeat reliably, which has some interesting complexity but needs to be executed correctly every time. Let me give you an example: you have a process where every time an invoice is processed in your system, you need to check if you've already paid it, how it was paid, when it was paid, and to whom it was paid. This has some inherent uncertainty, maybe there's no straightforward way to handle it with traditional development approaches, or maybe there are edge cases you don't want to accidentally trigger.

For this kind of agent, the LLM should be narrowed to a specific scope. You know what the inputs and outputs are, and you have a clear blueprint of tasks that need to be executed. In this case, it's super important to have fewer tools. The fewer tools available, the more reliable and valid the LLM will be for this use case. You have a blueprint already defined, with specific tools for checking the invoice, verifying payment status, and so on. The context stays small, the task is self-contained within clear limits, and you don't invite the LLM to use other tools that might confuse it or lead it astray.

Conclusion

So here's what I'm thinking: you have these two distinct types of agents, and if you're building agents right now, you need to be really thoughtful about tool selection.

If you want to build an agent with maximum exploratory capability, don't just throw every MCP tool at it. Use strict planning and create a clear blueprint for when tools should be used. Don't add tools that aren't necessary, because when you do, you're essentially inviting the LLM to use them, and that may lead to hallucinations or incorrect behavior.

This is especially critical when you want goal-oriented behavior. The scope of the task might be well-defined, but there's uncertainty in the execution. On the other hand, if you're building a chatbot for your company users—something more exploratory—then you do want to give the LLM more tools and more information. You're accepting lower accuracy in exchange for giving users hints about what can be done and what paths they can explore. That's important for the end goal: providing direction to users.

Agent-to-Agent Protocol

Finally, one of the reasons I'm really excited about agent-to-agent protocol is that they let you have multiple agents, each with a super narrow context and highly specific information. Each agent has its own blueprint and set of tasks, allowing you to define well-scoped agents with precisely the tools they need, validated by different teams. You can see how multiple agents interact with this information, essentially saying "give me just the information I need."

When you have an agent-to-agent protocol, the LLM in your chat interface (using something like the OpenAI specification) can choose to call another agent as a tool. That agent might have its own complete set of tools, MCPs, and specialized information.

The knowledge is evolving. Anthropic's recent MCP filtering feature is a game changer because it acts as a proxy that uses an LLM to decide which tools should be presented for specific use cases. This has made me really happy because I'm seeing better results. While this isn't yet globally available for all models, it's coming. If not this exact approach, then something similar. And it's going to be a big deal for all of us building with these tools.

Tags

Related articles: