How AI Agents Decide What To Do (Explained Simply)

In the last post, we saw this:

AI can use tools.

It can:

search the web
run code
open files
call APIs

Cool.

But something still feels… missing.

Because there's one question almost nobody properly explains:

How does the AI decide which tool to use?

The Hidden Step Most People Never See

When you type something like:

"What's the weather in London?"

the AI doesn't just magically "know" what to do.

Under the hood, something more structured is happening.

Instead of replying normally, the model generates a structured tool call — essentially an instruction that says:

"Use this tool with these inputs."

The exact format varies by model and framework, but the idea is always the same: instead of plain text, the model outputs a structured action for the system to execute.

And honestly, this is the moment AI agents started making more sense to me.

Because suddenly it stopped feeling like magic.

At First, I Assumed One Giant Model Did Everything

My original mental model was basically:

"Okay… GPT probably handles everything itself."

Reasoning. Planning. Tool selection. Responses. Memory. Everything.

And to be fair, a lot of modern systems actually do work like that.

Large models from companies like OpenAI, Anthropic, and Google Gemini can often decide which tools to call directly.

But then I discovered something interesting.

Some systems use smaller, specialized models just for tool calling.

And that's where FunctionGemma comes in.

Meet FunctionGemma

FunctionGemma is a specialized version of Google's Gemma model built specifically for function calling.

Not chatting. Not storytelling. Not writing essays.

Its main job is:

Take user intent → convert it into structured tool calls.

That's it.

And honestly, I think that idea is fascinating.

Because instead of trying to make one giant model do everything…

you can split the system into smaller, focused parts.

And Here's the Wild Part

FunctionGemma is tiny compared to modern large language models.

It's built on the Gemma 3 270M parameter model.

Which sounds ridiculously small in today's AI world.

But the important realization is this:

Tool calling is actually a much narrower problem than open-ended conversation.

The model doesn't need to:

write novels
explain philosophy
debate politics

It mainly needs to:

understand intent
choose a function
generate structured outputs correctly

That constrained problem makes smaller specialized models surprisingly effective — especially after fine-tuning.

And the size has another huge advantage: FunctionGemma is specifically designed to run on-device — on laptops, phones, or edge hardware — without needing a server. That's a big deal for privacy and offline use.

This Completely Changed How I See AI Agents

Before this, I imagined agents like:

"One super-intelligent AI doing everything."

But now I think of them more like systems made of layers:

Main model         → reasoning and conversation
Tool-calling layer → converts intent into actions
Tools / APIs       → actually perform the work
Orchestration      → manages the loop, retries, and flow

Almost like:

Brain → Decision Layer → Hands

And suddenly agents feel a lot less mystical. And a lot more understandable.

So What Actually Happens in an Agent?

At a very basic level, the loop is surprisingly simple.

User says:

"Search AI startups"

The system generates a structured tool call, something like:

tool: search_web
query: "AI startups"

Then:

The backend executes the tool
Gets the results
Sends them back to the model
The model continues

Think → Call tool → Get result → Continue

That's the core loop.

Real Agents Add More Layers

Of course, production agents usually become much more complex.

They may include:

memory
retries
validation
planning
permissions
context management
error handling

But the important thing is: the core idea is still understandable.

And honestly, that realization made AI feel way more approachable to me.

One Important Thing Most People Miss

FunctionGemma is not really meant to be dropped in "as-is" as a universal agent model.

Google actually positions it as a foundation for fine-tuning.

Meaning:

You define your own tools
Train it on your own examples
Improve its reliability for your specific use case

So instead of:

"one AI that knows everything"

you get:

"a small, specialized model trained for your specific workflows"

That's a very different philosophy. And the benchmark numbers back it up — the base model scores 58% on Mobile Actions tasks. After fine-tuning? 85%.

And Honestly… I Think This Is Where AI Is Heading

The more I learn about agents, the more it feels like modern AI systems are becoming modular.

Instead of one massive model doing everything, we're starting to see:

routers
planners
memory systems
verification models
specialized tool-callers
local edge models

Smaller parts working together.

And weirdly… that makes the whole field feel less intimidating.

What I'm Doing Next

So instead of only reading about agents…

I want to build one myself.

A small one. From scratch. Nothing insane.

Just:

a few tools
a simple loop
tool-calling logic
structured outputs
backend execution

And I'll document the whole process here as I learn.

One Line Worth Remembering

AI agents are not just "one giant AI." They're systems made of smaller parts. And sometimes… the part deciding what action to take next can be surprisingly small.

The Part Nobody Explains: How AI Agents Decide What To Do

The Hidden Step Most People Never See

At First, I Assumed One Giant Model Did Everything

Meet FunctionGemma

And Here's the Wild Part

This Completely Changed How I See AI Agents

So What Actually Happens in an Agent?

Real Agents Add More Layers

One Important Thing Most People Miss

And Honestly… I Think This Is Where AI Is Heading

What I'm Doing Next

One Line Worth Remembering

Comments

AI Agents, Explained

What Is a "Tool" in AI Agents? (The Part That Makes Them Useful)

More from this blog

What Is a "Tool" in AI Agents? (The Part That Makes Them Useful)

what is AI Agent

Command Palette

The Hidden Step Most People Never See

At First, I Assumed One Giant Model Did Everything

Meet FunctionGemma

And Here's the Wild Part

This Completely Changed How I See AI Agents

So What Actually Happens in an Agent?

Real Agents Add More Layers

One Important Thing Most People Miss

And Honestly… I Think This Is Where AI Is Heading

What I'm Doing Next

One Line Worth Remembering

Comments

AI Agents, Explained

What Is a "Tool" in AI Agents? (The Part That Makes Them Useful)

More from this blog