Skip to content
woolly.me
← Articles
finops series

The attribution problem, or why "just tag everything" fails

Tags are the standard answer to AWS cost allocation and they've never once been enough on their own. What a workable attribution layer actually looks like.

Last updated

At the end of the series opener I said “just tag everything” has never survived contact with a real AWS organization. Here’s the autopsy, and what to build instead.

The advice keeps getting given because it sounds complete. Pick a tag policy, apply it everywhere, group the bill by tag, done. I’ve watched versions of this kick off at a healthcare enterprise and at startups, and I’ve run cost attribution across 50+ microservices at Postscript. The tags-only version fails the same four ways every time.

The four failure modes

Tags don’t apply backward. Cost-allocation tags only attribute spend from the moment they’re activated; last quarter stays dark forever. Every attribution project inherits a pile of history it can never explain, which is also the argument for turning the machinery on long before you think you need it.

A large slice of the bill can’t be tagged in any useful way. Data transfer is the classic: it’s real money at scale, and it belongs to flows between systems, not to a resource you can label. Support plans, NAT processing, marketplace charges, CloudWatch ingestion from a shared account — the ownership question isn’t answerable by a key-value pair on a resource.

Enforcement decays. The week the tag policy ships, coverage is great. Six months later there are new services, an acquisition, three engineers who never read the wiki, and coverage has rotted. Tagging via IaC modules helps enormously (every resource in our Terraform carries its cost tags because the module injects them, not because anyone remembers), and SCP-level tag enforcement helps more. But anything that depends on humans remembering keys is in permanent decline.

And the big one: shared platforms. An EKS cluster running thirty teams’ workloads is, to the bill, one autoscaling group and one control plane fee. Perfect tag coverage on that cluster attributes 100% of the cost to “platform,” which is true and useless. The same goes for a shared MSK cluster or a shared NAT gateway. The more you invest in platform consolidation (which you should), the smaller the fraction of your bill that resource tags can meaningfully divide.

What works: layers, not labels

Attribution that survives is built in layers, coarse to fine.

Account boundaries first. The account is the only attribution mechanism enforced by physics: spend in the account is in the account’s bill, no discipline required. If your org map roughly matches your account map, most attribution is free. This is the strongest argument for multi-account structure that nobody makes when the accounts are being created.

Tags second, for what they’re good at: dividing spend inside an account among a handful of owners, injected by IaC so coverage doesn’t depend on memory. Keep the taxonomy small. Three tags that are always present beat eleven that are usually missing.

Usage-based allocation third, for the shared platforms. Pod-level requests and usage (Kubecost, OpenCost, or the CUR’s split-cost allocation) divide the EKS bill; per-topic throughput divides the Kafka bill. This layer is real engineering work, which is why it gets skipped, and it’s where the money hides in any platform-heavy org.

Last, an explicit “unattributed” bucket, published and shrinking. Not zero. The teams that insist on 100% attribution end up smearing shared costs across teams by headcount, and the first time a team gets charged for something it can’t control, the whole system loses credibility. An honest 10% remainder keeps the other 90% trusted.

Where to start Monday

Turn on the Cost and Usage Report and activate cost-allocation tags now, because both start recording from activation. Build the account-to-owner map before any tag work. Then take the top twenty line items on last month’s bill and attribute those, specifically, by whatever layer fits. Twenty line items is usually most of the spend, and “most of the spend, this month” beats “all of the spend, someday” every time it’s tried.

Next in the series: purchase commitments, and why buying Savings Plans before the cleanup locks your waste in at a discount.

Questions this raises

What's the fastest way to start attributing AWS spend?
Turn on the Cost and Usage Report and cost-allocation tags today (both are retroactive-blind, so every week of delay is a week of unattributable history), then map accounts to owners before touching a single tag. In a multi-account org, the account boundary alone usually attributes the majority of spend.
How do you split the cost of a shared EKS cluster?
By measured usage, not by tags. The cluster is one line item on the bill no matter how well the workloads inside it are labeled, so you need pod-level requests/usage data (Kubecost, OpenCost, or CUR split-cost allocation for EKS) to divide it. Decide the split key deliberately: requests punish over-provisioning, usage punishes nobody, and that choice changes team behavior.
Should unattributed spend be zero?
No, and chasing zero is how attribution projects die. Data transfer, support plans, and shared plumbing resist clean ownership. Get the unattributed bucket under roughly 10% and falling, publish it as its own line, and spend your energy on the top line items instead.

Consulting

Dealing with this on your own infrastructure?

I take contract and consulting engagements on exactly this kind of work.

Get in touch