Edge Forward | Workload Placement Tokenomics

Your scenario

Every assumption is exposed on purpose; the model is yours to stress.

Deployment type

Autonomous agents Chatbot / single-turn

Agents loop; chatbots answer once. Seat subscriptions cover chatbots; this model prices agents, which do not flat-rate.

Fleet and usage

Fleet size, seats

Tasks per user per day, Year One

Placement boundary: the local bet

How much of the complex, expensive work moves onto local hardware over time. These are the model's load-bearing assumptions. Set them low and the hybrid loses; the model will say so.

Complex work running locally by Year Two, percent

Complex work running locally by Year Three, percent

The skeptic's case halves adoption growth, cuts the local bet to a fraction of the default, and charges the Copilot+ NPU refresh back to the strategy. If the model could not lose, it would not be worth trusting when it wins.

Advanced assumptions

The dials most visitors never need: demand growth, cloud price deflation, the routine versus complex split, and the hardware premiums. Nothing in here is hidden from the math, and the chips above show the current values even while this stays closed; the methodology panel in the results shows exactly what each one does.

Demand and pricing

Employee adoption growth per year, percent

Cloud price drop per year, percent

Routine share of all workflows, percent

Business days per year

The routine share is yours to set; the practice does not assert a fixed split.

Hardware: what the strategy pays for

Copilot+ refresh premium per seat, dollars

RTX Spark devices, percent of fleet

RTX Spark device cost per device, dollars

What do you want to see first?

Budget impact first Security and privacy first

This reorders the results only. The numbers do not change.

Net 3-yr savings vs discounted cloud Payback Year 3 hybrid run-rate Year 3 discounted cloud run-rate Request an Envisioning Workshop

Three plans, three years

Total spent to date under each plan, with the hybrid carrying its full hardware bill from day zero. The moment the green line crosses below a cloud line is the payback moment, marked on the chart. This is the most honest view, so it is the default.

The summary a CFO reads first

What each number means

The link carries every slider setting. Whoever opens it sees these numbers recomputed by the engine, not a screenshot.

What runs where, year by year

The green share runs on devices you already own, with no per-task operating cost. The rest is your remaining cloud bill, priced at premium cloud rates after each year's price drop.

Your numbers, in plain English

Generated by the engine from your exact scenario. This is the spoken version of the chart.

Seven questions every agent must answer

Keeping data home is necessary, not sufficient. To deploy agents safely at enterprise scale, your security team needs answers to seven questions, and the hybrid plan answers all seven with security software your company already owns and operates.

Control	The question it answers	Answered by
Identity	Who or what is this agent acting for?	Entra ID, Hello for Business
Data access	What files and data is it allowed to see?	Purview, Information Protection
Action limits	What is it allowed to do on the device?	MXC isolation, Defender policies
Location	Is this loop running locally, or sending data out?	MXC, Foundry Local, Azure AI
Audit trail	Where is its action history recorded?	Defender, Sentinel, Purview Audit
Escalation	When is a task too big for the device, and who decides?	Practice-defined routing policy
Kill switch	How does IT shut a misbehaving agent down, instantly?	Intune, Entra Conditional Access

Methodology, in the open

An AI agent solves a task in steps, and because the cloud does not remember the previous step, it charges you to re-read the whole conversation at every one of them, even after caching discounts. Call it the memory tax. This model compares three ways to pay for the same work over three years: the standard cloud plan most companies have, the discounted cloud plan a sharp architect would build, and a hybrid plan that moves work onto devices you already own.

Every figure on this page is computed by the engine from your scenario at request time. Nothing is hand-typed. The ways this model deliberately favors the cloud baselines are published in full below.

The memory tax, in plain terms

An AI agent solves a task in steps: plan, act, check, repeat. The cloud does not remember the previous step, so the agent sends the entire conversation back up on every step, and the cloud charges to re-read it every time. Most of what you pay for is not the answer. It is the agent re-reading its own notes. Caching discounts that re-reading; it does not eliminate it.

For the architects, the precise mechanics we model: on the first turn the full context is written to the provider's cache at the write-premium rate. On every later turn the accumulated history is billed at the cache-read rate, and only the new tail (this turn's tool result plus the prior turn's output) is written at the premium rate. Output tokens are billed at the output rate on every turn. That is the entire cost mechanism; nothing else is modeled.

What this model prices, and what it does not

People are billed by the seat because people are slow and predictable. Agents are billed by the token because they are neither. Your seat subscriptions (Copilot and its peers) cover the first kind of work; this model does not price them and does not argue against them. Keep your seats, keep your laptops, both are table stakes. The decision priced here is where the agents run.

Spend type	What it covers	What happens at agent scale	In this model
Seat subscriptions	Assistive, human-in-the-loop work: drafting, summarizing, asking	Caps and throttles appear; providers are already trimming the usage included per seat	Kept, not priced, not argued against
Metered cloud APIs	Autonomous agents: loops that run without a human per turn	Volume compounds with adoption; the meter is where growth lands	The red and amber lines
Local API calls	The same agentic work, served from an endpoint you own	Same API shape; placement changes the meter, not the code	The green line

Where this model favors the cloud

This model is biased against its own conclusion.

The cloud baselines are granted perfect caching: every turn is assumed to hit the cache, and no cache entry ever expires between turns. Real deployments miss caches and pay re-write costs. The cloud baselines also receive percent annual price deflation, applied in full and on schedule, even though the current memory squeeze is pressing provider costs in the other direction. The hybrid line is charged its full hardware bill up front, in Year One, with no residual value credited. The cloud lines on this page are floors, not estimates.

What the strategy is charged for

Only hardware the strategy requires is billed to it. The Copilot+ NPU refresh is priced at zero by default because fleets are getting that tier regardless of any AI decision; the slider in Advanced settings charges it back if you disagree. The RTX Spark tier, the hardware this page actually prices, is charged in full in Year One.

What the hardware actually is

Tier	Device class	What it runs	Which workload	Who gets one
Table stakes	Copilot+ NPU laptop; your refresh delivers these regardless	Small local models	The routine tier, once routing exists	Every seat
The decision	RTX Spark class, high-memory: deskside box, shared workstation, or dedicated machine	Large open-weights models	The complex tier the absorption sliders move	Developers, analysts, creators, or a department sharing one

What is an assumption, and whose

The local absorption rates for Years Two and Three are scenario assumptions under your control, not predictions. Your current scenario routes percent of task volume to the routine tier, absorbs percent of the complex tier locally in Year Two and percent in Year Three. Set the absorption sliders to zero and the hybrid strategy loses; the model will say so.

Rate cards

What one task costs in the cloud, Year One

Generated from the engine's unit costs at these rate cards, so you can see where the money concentrates before any scale is applied.

Trace it yourself

View the raw engine response that rendered this page

Every figure above is a field in this response. If you find one that is not, the build is broken and we want to know.

This page prices one evaluator of six

Inference economics is the evaluator a calculator can quantify. The placement decision also turns on five more: data-handling constraints (privacy, security, sovereignty), local context and content, proximity to sensors, latency and offline capability, and sustainability. Scoring your actual workloads against all six, and placing each one across on-device, on-prem desktop, on-prem departmental, or cloud, is the Workload Routing Workshop. The math on this page is the part of that workshop you can run without us.

A calculator shows you the math. We can show you the loop running live, on one device, against your scenario, with zero cloud inference, in fifteen minutes.

Book the free executive briefing Request an Envisioning Workshop

The briefing costs you an hour and nothing else. The workshop is where we score your workloads against all six evaluators.

Agents loop. The meter runs on every turn.