Skip to main content

Command Palette

Search for a command to run...

The Part Nobody Explains: How AI Agents Decide What To Do

Published
5 min read
The Part Nobody Explains: How AI Agents Decide What To Do
A
I am an Applied AI Builder and Explorer

In the last post, we saw this:

AI can use tools.

It can:

  • search the web
  • run code
  • open files
  • call APIs

Cool.

But something still feels… missing.

Because there's one question almost nobody properly explains:

How does the AI decide which tool to use?


The Hidden Step Most People Never See

When you type something like:

"What's the weather in London?"

the AI doesn't just magically "know" what to do.

Under the hood, something more structured is happening.

Instead of replying normally, the model generates a structured tool call — essentially an instruction that says:

"Use this tool with these inputs."

The exact format varies by model and framework, but the idea is always the same: instead of plain text, the model outputs a structured action for the system to execute.

And honestly, this is the moment AI agents started making more sense to me.

Because suddenly it stopped feeling like magic.


At First, I Assumed One Giant Model Did Everything

My original mental model was basically:

"Okay… GPT probably handles everything itself."

Reasoning. Planning. Tool selection. Responses. Memory. Everything.

And to be fair, a lot of modern systems actually do work like that.

Large models from companies like OpenAI, Anthropic, and Google Gemini can often decide which tools to call directly.

But then I discovered something interesting.

Some systems use smaller, specialized models just for tool calling.

And that's where FunctionGemma comes in.


Meet FunctionGemma

FunctionGemma is a specialized version of Google's Gemma model built specifically for function calling.

Not chatting. Not storytelling. Not writing essays.

Its main job is:

Take user intent → convert it into structured tool calls.

That's it.

And honestly, I think that idea is fascinating.

Because instead of trying to make one giant model do everything…

you can split the system into smaller, focused parts.


And Here's the Wild Part

FunctionGemma is tiny compared to modern large language models.

It's built on the Gemma 3 270M parameter model.

Which sounds ridiculously small in today's AI world.

But the important realization is this:

Tool calling is actually a much narrower problem than open-ended conversation.

The model doesn't need to:

  • write novels
  • explain philosophy
  • debate politics

It mainly needs to:

  • understand intent
  • choose a function
  • generate structured outputs correctly

That constrained problem makes smaller specialized models surprisingly effective — especially after fine-tuning.

And the size has another huge advantage: FunctionGemma is specifically designed to run on-device — on laptops, phones, or edge hardware — without needing a server. That's a big deal for privacy and offline use.


This Completely Changed How I See AI Agents

Before this, I imagined agents like:

"One super-intelligent AI doing everything."

But now I think of them more like systems made of layers:

Main model         → reasoning and conversation
Tool-calling layer → converts intent into actions
Tools / APIs       → actually perform the work
Orchestration      → manages the loop, retries, and flow

Almost like:

Brain → Decision Layer → Hands

And suddenly agents feel a lot less mystical. And a lot more understandable.


So What Actually Happens in an Agent?

At a very basic level, the loop is surprisingly simple.

User says:

"Search AI startups"

The system generates a structured tool call, something like:

tool: search_web
query: "AI startups"

Then:

  1. The backend executes the tool
  2. Gets the results
  3. Sends them back to the model
  4. The model continues

Think → Call tool → Get result → Continue

That's the core loop.


Real Agents Add More Layers

Of course, production agents usually become much more complex.

They may include:

  • memory
  • retries
  • validation
  • planning
  • permissions
  • context management
  • error handling

But the important thing is: the core idea is still understandable.

And honestly, that realization made AI feel way more approachable to me.


One Important Thing Most People Miss

FunctionGemma is not really meant to be dropped in "as-is" as a universal agent model.

Google actually positions it as a foundation for fine-tuning.

Meaning:

  • You define your own tools
  • Train it on your own examples
  • Improve its reliability for your specific use case

So instead of:

"one AI that knows everything"

you get:

"a small, specialized model trained for your specific workflows"

That's a very different philosophy. And the benchmark numbers back it up — the base model scores 58% on Mobile Actions tasks. After fine-tuning? 85%.


And Honestly… I Think This Is Where AI Is Heading

The more I learn about agents, the more it feels like modern AI systems are becoming modular.

Instead of one massive model doing everything, we're starting to see:

  • routers
  • planners
  • memory systems
  • verification models
  • specialized tool-callers
  • local edge models

Smaller parts working together.

And weirdly… that makes the whole field feel less intimidating.


What I'm Doing Next

So instead of only reading about agents…

I want to build one myself.

A small one. From scratch. Nothing insane.

Just:

  • a few tools
  • a simple loop
  • tool-calling logic
  • structured outputs
  • backend execution

And I'll document the whole process here as I learn.


One Line Worth Remembering

AI agents are not just "one giant AI." They're systems made of smaller parts. And sometimes… the part deciding what action to take next can be surprisingly small.

AI Agents, Explained

Part 1 of 3

A complete, practical guide to understanding how AI agents actually work — from tools and memory to workflows, RAG, and multi-agent systems. No hype, just clear explanations and real examples.

Up next

What Is a "Tool" in AI Agents? (The Part That Makes Them Useful)

In the last post, we said this: An agent = AI that keeps trying But that brings up a fair question. How does it actually do anything? How does it open a file? Search the web? Run code and check if i