Skip to content
engineering

How We Built an AI Agent That Configures Your Image CDN — 5 Lessons From Shipping the AuraImage MCP Server

N

Narek Hakobyan

Your AI coding agent can now set up an entire image pipeline — upload, CDN transforms, BlurHash, <picture> tags, alt text, LCP telemetry — in one command. Here is what we learned building that.

We started with the wrong abstraction

When we first designed the MCP server, we built one mega-tool — setup_auraimage — that tried to do everything. It would walk the project, find images, upload them, rewrite imports, configure the CDN,

and spit out a summary. It took 45 seconds and broke on every edge case.

The fix was counter-intuitive: we split it into five single-purpose tools that each do exactly one thing and compose well:

  • audit_lcp — scan a directory, report unoptimized images, estimate LCP savings
  • migrate_assets — upload local images to the CDN, rewrite <img> tags
  • generate_alt — produce accessible alt text via Gemini vision
  • generate_responsive_tag — return a <picture> element with AVIF/WebP srcsets
  • smart_crop_preview — show all four crop variants for one image

The lesson: AI agents need small, composable primitives, not monolithic workflows. The agent chains them together better than any deterministic pipeline could, because it has context about the user's project that our server doesn't.

Tool descriptions are the new API docs

When a human reads API docs, they skip the intro, scan the method signatures, and copy-paste the example. An AI agent reads tool descriptions with the same scrutiny.

We iterated the description strings more than the implementations. The final audit_lcp description is:

"Scans a project directory for unoptimized images (files in public/ and <img> tags not pointing to AuraImage CDN) and estimates LCP savings. Returns a report with file paths, current sizes, and estimated savings if migrated."

Compare that with our first draft — "Audit LCP" — which the agent would sometimes ignore because it didn't know when to invoke it.

Specific lessons:

  1. Describe when to use the tool, not just what it does
  2. Mention side effects (e.g., "this rewrites your JSX files" on migrate_assets)
  3. Give concrete output examples — agents pattern-match on response shapes

The upload token model was the hardest part

The migrate_assets tool uses HMAC-signed upload tokens — the agent signs a token server-side with the user's Secret Key, hands it to the client, and the file uploads directly to the CDN edge without touching the user's backend.

The tricky part: the Agent only has a stdio channel to the MCP server. It can't open a browser, can't read clipboard, can't interact with the filesystem outside the project directory.

Our migrate_assets tool had to:

  1. Accept a projectName parameter
  2. NOT require the Secret Key in the prompt (that would leak it to the LLM context)
  3. Read the key from AURA_SECRET_KEY env var instead
  4. Return a structured report with the new CDN URLs so the agent could update references

This is the kind of constraint that only emerges when you actually build for agents, not just API consumers.

Alt text generation taught us about agent trust

The generate_alt tool sends the image to Gemini and returns alt text. We originally returned:

{ "alt": "Two engineers reviewing code on a laptop" }

The agent would copy-paste this into <img alt="Two engineers reviewing code on a laptop" /> without question. But what if the vision model got it wrong? A person would notice; an agent wouldn't.

We added a confidence field:

{ "alt": "Two engineers reviewing code on a laptop", "confidence": "medium", "note": "Could not determine gender or age" }

This gave the agent enough information to decide whether to accept the suggestion or ask the user.

The general principle: when an agent consumes AI-generated output, include quality signals so it can make informed decisions about whether to use the result.

The CDN URL API is the agent interface

We spent weeks debating whether agents need a special "agent API" with structured outputs, separate endpoints, or JSON responses.

Turns out, the existing URL transform API (?w=800&h=600&fit=face) works perfectly for agents. generate_responsive_tag and smart_crop_preview both construct URLs using the same API that human developers use in their code.

The only "agent-specific" thing we added: the tools return URLs as strings, not HTML, so the agent can compose them into whatever format the user's project needs (JSX, Svelte, Vue, even plain Markdown).

Lesson: don't build a separate agent API. Make your existing API composable, and agents will figure out the rest.

What we'd do differently

  1. Ship a dryRun flag on every tool from day one. Agents are eager beavers — if you give them a tool that modifies files, they'll use it. We added dryRun to migrate_assets after the agent migrated a user's entire public/ directory before they confirmed.

  2. Don't use abbreviations in tool names. audit_lcp works because LCP is a well-known metric. But we originally had gen_alt and the agent couldn't figure out it meant "generate alt text." Full words, always.

  3. Version the tool schemas. We broke migrate_assets when we changed slug to projectName in the parameter schema. The agent had memorized the old parameter name. Next time, we'll add a version field to each tool response so the agent knows the schema changed.

What this means for the future

The MCP ecosystem is still young, but the pattern is clear: tools that are narrow, well-described, and composable outperform monoliths, regardless of how "smart" the AI agent is.

Google's UCP (Universal Commerce Protocol) is building on similar ideas for commerce — give agents small, well-typed actions they can compose into workflows. The principles we learned apply beyond image infrastructure.

If you are building your own MCP tool, start with one tool that does one thing well and has a crystal-clear description. You will know you got it right when you don't need to write any agent prompt engineering — the agent just calls it correctly on the first try.


We open-sourced the MCP server at github.com/auraimage/mcp-server. Install it with npx @auraimage/mcp-server@latest and give audit_lcp a try on your project.

Read more