# The Part Nobody Explains: How AI Agents Decide What To Do

In the last post, we saw this:

AI can use tools.

It can:

- search the web
- run code
- open files
- call APIs

Cool.

But something still feels… missing.

Because there's one question almost nobody properly explains:

**How does the AI decide which tool to use?**

---

## The Hidden Step Most People Never See

When you type something like:

> "What's the weather in London?"

the AI doesn't just magically "know" what to do.

Under the hood, something more structured is happening.

Instead of replying normally, the model generates a **structured tool call** — essentially an instruction that says:

*"Use this tool with these inputs."*

The exact format varies by model and framework, but the idea is always the same: instead of plain text, the model outputs a structured action for the system to execute.

And honestly, this is the moment AI agents started making more sense to me.

Because suddenly it stopped feeling like magic.

---

## At First, I Assumed One Giant Model Did Everything

My original mental model was basically:

*"Okay… GPT probably handles everything itself."*

Reasoning. Planning. Tool selection. Responses. Memory. Everything.

And to be fair, a lot of modern systems actually do work like that.

Large models from companies like OpenAI, Anthropic, and Google Gemini can often decide which tools to call directly.

But then I discovered something interesting.

Some systems use **smaller, specialized models** just for tool calling.

And that's where FunctionGemma comes in.

---

## Meet FunctionGemma

FunctionGemma is a specialized version of Google's Gemma model built specifically for function calling.

Not chatting. Not storytelling. Not writing essays.

Its main job is:

**Take user intent → convert it into structured tool calls.**

That's it.

And honestly, I think that idea is fascinating.

Because instead of trying to make one giant model do everything…

you can split the system into smaller, focused parts.

---

## And Here's the Wild Part

FunctionGemma is **tiny** compared to modern large language models.

It's built on the **Gemma 3 270M parameter model**.

Which sounds ridiculously small in today's AI world.

But the important realization is this:

> Tool calling is actually a much narrower problem than open-ended conversation.

The model doesn't need to:

- write novels
- explain philosophy
- debate politics

It mainly needs to:

- understand intent
- choose a function
- generate structured outputs correctly

That constrained problem makes smaller specialized models surprisingly effective — especially after fine-tuning.

And the size has another huge advantage: FunctionGemma is specifically designed to run **on-device** — on laptops, phones, or edge hardware — without needing a server. That's a big deal for privacy and offline use.

---

## This Completely Changed How I See AI Agents

Before this, I imagined agents like:

*"One super-intelligent AI doing everything."*

But now I think of them more like **systems made of layers**:

```
Main model         → reasoning and conversation
Tool-calling layer → converts intent into actions
Tools / APIs       → actually perform the work
Orchestration      → manages the loop, retries, and flow
```

Almost like:

**Brain → Decision Layer → Hands**

And suddenly agents feel a lot less mystical. And a lot more understandable.

---

## So What Actually Happens in an Agent?

At a very basic level, the loop is surprisingly simple.

User says:

> "Search AI startups"

The system generates a structured tool call, something like:

```
tool: search_web
query: "AI startups"
```

Then:

1. The backend executes the tool
2. Gets the results
3. Sends them back to the model
4. The model continues

**Think → Call tool → Get result → Continue**

That's the core loop.

---

## Real Agents Add More Layers

Of course, production agents usually become much more complex.

They may include:

- memory
- retries
- validation
- planning
- permissions
- context management
- error handling

But the important thing is: **the core idea is still understandable.**

And honestly, that realization made AI feel way more approachable to me.

---

## One Important Thing Most People Miss

FunctionGemma is not really meant to be dropped in "as-is" as a universal agent model.

Google actually positions it as **a foundation for fine-tuning**.

Meaning:

- You define your own tools
- Train it on your own examples
- Improve its reliability for your specific use case

So instead of:

> "one AI that knows everything"

you get:

> "a small, specialized model trained for your specific workflows"

That's a very different philosophy. And the benchmark numbers back it up — the base model scores 58% on Mobile Actions tasks. After fine-tuning? **85%.**

---

## And Honestly… I Think This Is Where AI Is Heading

The more I learn about agents, the more it feels like modern AI systems are becoming **modular**.

Instead of one massive model doing everything, we're starting to see:

- routers
- planners
- memory systems
- verification models
- specialized tool-callers
- local edge models

Smaller parts working together.

And weirdly… that makes the whole field feel less intimidating.

---

## What I'm Doing Next

So instead of only reading about agents…

I want to **build one myself.**

A small one. From scratch. Nothing insane.

Just:

- a few tools
- a simple loop
- tool-calling logic
- structured outputs
- backend execution

And I'll document the whole process here as I learn.

---

## One Line Worth Remembering

> AI agents are not just "one giant AI."
> They're systems made of smaller parts.
> And sometimes… the part deciding what action to take next can be surprisingly small.
