From photo to Shopify draft in one prompt.
An agent can now run the whole pipeline — end to end, from a single line of intent.
A small fashion brand brings a new tee to market. The work behind that one SKU usually involves a photographer, a model, a location scout, a stylist, someone retouching, a copywriter, an SEO specialist, a taxonomy/data person, an ops account manager, and a coordinator stitching the chain together. Call it a ten-person catalog team. Multiply by a hundred SKUs and a fall drop. The seller's evenings disappear.
We've been building roopafy for sellers who don't have that team. Until last week, the path was still: open roopafy in a browser, walk the wizard, click a button. Useful, but still a human at the wheel. This week we shipped the steering wheel itself: roopafy is now an MCP server. An AI agent can drive the entire pipeline — upload a garment photo, generate a model shoot, write the listing with SEO, fill the Shopify taxonomy, and push a publishable draft — from a single prompt.
This post is the honest version of how we did it, how you set it up, what works today, and what doesn't yet.
The shift
For three years the conversation about "AI for e-commerce" has meant assistance: a writing helper, a description generator, a thumbnail upscaler. You still juggled six tabs and made every decision.
Agentic e-commerce is different. You give the agent a goal — "turn this garment into a publishable listing, lookbook aesthetic, premium tone" — and it makes the decisions. Which model. Which poses. Which background. Which words. What taxonomy attributes Shopify needs filled. Whether the first cut is good enough or worth one more pose. When to call it ready.
That has been technically possible for about a year — the language models are good enough — but operationally impossible: no platform exposed itself in a shape an agent could actually drive. MCP — Model Context Protocol — is the standard that fixed that. It lets any AI agent talk to any specialized platform through a common tool surface. roopafy is now one of those platforms.
What it looks like
A real prompt from yesterday's testing. The agent ran in Claude Cowork; I typed one sentence:
Create a roopafy listing from this image and walk it all the way through to a publish-ready draft: https://res.cloudinary.com/.../floral-crop-top.jpg
Eleven minutes later, the agent had:
Title: Floral Cap Sleeve Crop Top | Casual Women's Navy Multi
SEO title: Women's Floral Crop Top Casual Cap Sleeve Navy Purple Green
Description:
Burst into bloom with this playful floral crop top featuring a flattering round neckline and breezy cap sleeves made for golden hour adventures. The slim fit silhouette moves with you, keeping every casual look effortlessly flirty. Navy, purple, green, and white florals pop against any outfit.
Plus two more variations of each, nine additional attribute tags (button-front placket · lace trim neckline · fitted silhouette · dark floral print · feminine styling), the full Shopify taxonomy auto-filled (Color: Navy + Purple + Green + White · Pattern: Floral · Neckline: Round · Sleeve length: Cap · Top length: Crop top · Target gender: Female · Age group: Teens + Adults), and seven model shots our roopafy AI agents composed for this garment:
Studio Identity · Courtyard Stride · Lace Detail Closeup · Street Market Over-the-Shoulder · Café Seated · Alley Profile · Golden Light Closing
Notice the third shot — a beauty-shot framing of the floral print and round neckline. That wasn't a generic template. Our roopafy AI agents are fashion-native: our own virtual try-on models, trained on a billion garment images, paired with a pose library tuned for fashion editorial and a garment-physics layer that knows how a cap-sleeved crop top actually drapes on a real body. That's why they read the garment's distinctive details from the source photo and added dedicated shots for them. The whole initial generation cost five credits. No human approval anywhere in the chain until "ready to publish."
That's what "agentic e-commerce" actually means: not faster human work, but the work itself, done.
Why it works now
Two things had to be true for this to be possible.
First, the agent needs discoverable, well-described tools
— not a REST API and a 200-page integration guide. MCP delivers this.
The agent sees seventeen tools — list_models,
upload_garment_image, analyze_garment,
create_product, generate_listing,
tweak_pose, update_listing_content,
publish_to_shopify, and so on — each with a short
description, a typed input schema, and a clear contract. The agent
picks the right tool by reading the descriptions the same way a junior
engineer reads a function signature. No glue code.
Second, the platform behind the tools has to be built to be driven. This is where most retrofits fall over. roopafy's original wizard was a sequence of steps — upload → categorize → configure → generate → review → publish. We built that sequence as composable services from day one, because we knew the wizard was just the first client. And what's underneath isn't a general-purpose image API — it's our own virtual try-on models, trained on a billion garment images, with a fashion-editorial pose library and a garment-physics layer baked in. Fabric drape, sleeve break, body movement, the difference between a catalog frame and a lifestyle scene — the agent doesn't have to prompt-engineer around any of it. The framework already knows. So exposing it to an agent wasn't a rewrite; it was a translation. Seventeen MCP tools, one per meaningful action, sitting in front of the same agentic framework the dev console uses. The agent isn't going through a special "API mode" — it's literally using the same orchestration.
A few architectural choices that mattered:
- Server-fetched images. Image bytes never travel through the agent's token stream. The agent passes a URL; roopafy's server fetches and stages it in our CDN. Why this matters: a single 100 KB image is roughly 30,000 tokens of base64. An agent that has to emit that as a tool argument stalls before the call dispatches. We learned this the hard way.
-
Per-session spend caps. An agent that goes off the
rails can't drain your account. You pass an optional
X-MCP-Spend-Cap: 50header at session start and the credit counter enforces it atomically across every charging call in that session. - Personal access tokens, hashed at rest. Long-lived, individually revocable, scoped to one account. Mint them in your profile, paste them into the agent, revoke them whenever.
-
Normalized status polling. When generation kicks
off, the agent calls
get_generation_status(productId)and gets back a clean{ done, failed, progressPct }. No parsing of internal lifecycle strings. It either keeps polling or it doesn't. -
Sane defaults and actionable errors. If you forget
to pick a photo template, the tool rejects up front
(
TEMPLATE_REQUIRED) with the exact next step — instead of running for nine minutes and failing partway through generation.
These aren't features you'd brag about on a landing page. They're the difference between an MCP server an agent can actually use and one that produces a beautiful demo and a thousand support tickets.
Set it up in five minutes
You need a roopafy account on the Pro plan and Claude (Desktop or Cowork).
-
Mint a token. Sign in to roopafy → User Profile →
MCP Access. Give the token a name (e.g.
"Claude — laptop"). Click create. The raw token
(
rfy_…) is shown once; copy it. We store only a SHA-256 hash — we can't show it again, even to you. -
Add roopafy to Claude. Open
~/Library/Application Support/Claude/claude_desktop_config.json(macOS) and add roopafy as a server. Claude Desktop's native config doesn't take a remote URL plus a bearer header directly, so we bridge throughmcp-remote:{ "mcpServers": { "roopafy": { "command": "npx", "args": [ "-y", "mcp-remote", "https://mcp.roopafy.com/mcp", "--header", "Authorization: Bearer rfy_your_token_here" ] } } }If the
mcpServersblock already exists (preferences live in the same file), addroopafyas a sibling — don't nest it. We tripped on this ourselves: it's easy to miss the comma and end up with invalid JSON that fails silently. -
Restart Claude — fully.
Cmd-Q, not just close the window. The config is read once on launch. -
Smoke test. Open a fresh chat and ask:
If the agent comes back with your email and tier, you're connected. If you see "tool not found," the config didn't parse — check the JSON.roopafy: who am I?
-
First real run. Find any public image URL of a
garment (a product page, a Cloudinary link, anywhere). Then:
Walk away for ~5 minutes. Come back to a full listing.Create a roopafy listing from this garment: <URL>. Use the Classic photo template.
A note on plans. MCP access is a Pro plan feature. Basic users see an upgrade prompt instead of the token form. If your agent is going to drive your catalog, it should be on the plan that takes catalog seriously.
Use Cowork for the real flow
Claude Desktop is fine for trying tools one at a time. For real work — where the agent decides what to do next, runs the pipeline in a loop, polls progress, and shows you the result — use Claude Cowork. It's the place where the agent thinks for itself. Add the same MCP server in Cowork's settings and you're driving a multi-step pipeline from a single prompt.
The difference is real. In Desktop you'll find yourself nudging the agent through each step. In Cowork the agent owns the loop.
What isn't perfect yet — the honest list
We shipped, but a few things are still rough.
Inline image rendering in Cowork. Cowork renders rich
artifacts inside a sandboxed iframe with a strict content-security
policy. Our CDN's URLs aren't on its allowlist, so when the agent tries
to show you the generated shoot in a chat tile, the images come up
broken. We have a workaround live:
get_product(includeThumbnails: true) returns small base64
thumbnails the agent can embed inline as data: URIs. The
agent only fetches them when a human asks to see the shoot —
autonomous flows skip the cost entirely. Cowork will eventually
allowlist more hosts or the data: render path will
mature; meanwhile, this works today.
Local image attachments in Desktop. If you drag an
image file into the chat (instead of passing a URL), the agent has the
file in its sandbox but no clean way to get the bytes to us. The
sandbox is network-restricted; emitting the file as base64 tool
arguments stalls the model before the call ever dispatches. The
dependable path right now is: host the image somewhere public
(Cloudinary, S3, a Gist — anywhere a URL works), then pass the URL.
We're exploring a "request upload ticket" tool that hands the agent's
sandbox a short-lived signed upload URL so it can PUT the
bytes directly to our CDN, bypassing the model entirely. Watch this
space.
Shopify publish timing. Pushing a multi-image draft to
a connected Shopify
store can take longer than the default MCP request timeout. We're moving publish to the same async-and-poll pattern the
rest of the pipeline already uses. In the meantime, the listing is
fully publishReadiness: ready on the roopafy side; the
handoff to Shopify is the wobbly seam.
Generation time. A full seven-shot listing takes 3–8 minutes. The agent polls progress and surfaces the result; it doesn't sit blocking. But if you're impatient, watch for the Studio Front shot to land first — that one's a fast tell whether the model identity and garment fit are right.
None of these are blockers. They're the shape of "shipped and improving."
Where this goes — the refinement loop
One-shot listings are the easy half. The interesting half is the continuous loop, and it's already wired.
A typical week for a brand running on roopafy looks like this. The
agent watches Shopify analytics on a schedule. A SKU's lifestyle scene
— say, the "kitchen morning" shot from a women's tee — starts
under-performing on click-through compared to sibling SKUs. The agent
doesn't regenerate the whole shoot (expensive, wasteful, and most of
the gallery is fine). It calls tweak_pose
on that one image with a targeted instruction:
"swap to evening warmth — soft amber light, hands in pockets, more relaxed posture"
Before — serious, low energy
After — head thrown back laughing
tweak_pose call. One credit. Same model, same
garment, same brand voice — a new mood in a single slot. From an
actual autonomous run: the agent flagged its own Café Seated frame as
"serious, low energy," issued the tweak, and re-ran only that one
image.
One credit. The single image regenerates with the same model, the same garment, the same brand voice — just a new mood for that one slot. The agent re-publishes the draft, watches the next window's signal, decides whether to keep the new variant or revert. That's single-image enhancement as a primitive — not "regenerate everything and hope," but a scalpel.
Pair that with tweak_field to rewrite a flat-performing
SEO title in a sharper voice ("warmer, mention the linen, lead
with the use case") and update_listing_content to
refresh taxonomy attributes when Shopify category data shifts — and
you have a system that learns its catalog. Daily.
Per-SKU. Per-image. The seller decides only the brand voice and the
budget cap; the agent decides everything else, and revises its
decisions based on what actually converts.
This is what we mean when we say roopafy is built for agentic e-commerce. Not "AI tools that help you do your work." Agents that do the work — continuously, at the granularity of a single pose, paid for by the conversion lift they produce.
If you're a small fashion brand and shoot-day logistics have become the bottleneck on your growth, give the MCP server a few minutes. The setup above is the entire onboarding.
If you hit something in the rough list above — or in something we haven't found yet — write to us. We read everything, and the limitations move fast when they're in front of us. Mason — Chief Architect, roopafy