Why are most enterprises overpaying for cloud compute?

Most production workloads have become steady-state systems running at predictable utilisation for months or years. Cloud pricing is justified by elasticity, but organisations that no longer actively exploit that elasticity are effectively paying for insurance they never claim, creating a slow and compounding cost inefficiency.

Is ARM vs x86 still a performance debate?

No. The performance argument was settled in production years ago across hyperscalers, enterprise systems, and edge deployments. The more important question is where ARM compute should run and whether the cloud operating model is still appropriate for workloads that have become stable and predictable.

Why is architecture now considered a power problem rather than a performance problem?

At large scale, the difference between ARM and x86 determines how much compute can be physically deployed before hitting facility power and cooling limits. ARM's RISC lineage enables simpler decode pipelines and denser core layouts, making it a critical factor in infrastructure density and energy efficiency rather than just raw performance.

What is the real risk of delaying a move away from traditional cloud compute?

The real risk is not moving too early but falling behind competitors who have already stopped paying the cloud premium. Organisations that continue benchmarking EC2 instance families while competitors have adopted more cost-efficient infrastructure models are compounding a structural pricing disadvantage over time.

03 May 2026 Public Cloud

ARM Servers vs x86: How Edge ARM Servers with NVMe Storage and Cloudflare Could Disrupt Cloud Compute Economics

👁55views

Deploying edge ARM servers with NVMe storage behind Cloudflare's network can reduce cloud compute costs by 40–70% compared to equivalent AWS instances, primarily because ARM's power efficiency lowers hardware overhead while NVMe eliminates storage latency bottlenecks and Cloudflare absorbs egress costs that typically inflate AWS bills significantly at scale.

CloudScale AI SEO - Article Summary

1.
What it is
ARM vs AWS cost comparison reveals how a 1,000-node edge deployment using ARM servers, NVMe storage, and Cloudflare can cost 50–100K/month versus 800K–1.2M/month on AWS Graviton — a structural, not incremental, difference.
2.
Why it matters
Cloud storage pricing is the hidden breaking point — NVMe delivers 90,000 IOPS at under $100 per drive versus $130–$250/month per cloud node, making edge-based compute economics impossible for hyperscalers to match at scale.
3.
Key takeaway
Decoupling high-performance NVMe storage from cloud-based durability is the single most powerful lever for reducing infrastructure costs by an order of magnitude.

1. The uncomfortable starting point

If this model is even directionally correct, a large percentage of enterprise compute is structurally mispriced, and most organisations are paying a permanent premium for infrastructure characteristics they no longer use. Cloud pricing only makes sense when you actively exploit elasticity. The majority of production workloads have quietly become steady-state systems that run at predictable utilisation for months or years, and once that happens you are no longer paying for flexibility. You are paying for insurance you are not claiming against.

This creates a slow but compounding inefficiency that is easy to ignore because nothing breaks. The economics drift further from reality over time, yet budget cycles absorb the drift and infrastructure teams optimise within the model rather than questioning it. The risk is no longer that you move too early. The real risk is that your competitors stopped paying this tax three years ago while you continued benchmarking EC2 instance families.

2. The question the industry is still avoiding

ARM is no longer a debate about performance, because that argument was settled in production years ago across hyperscalers, enterprise systems, and edge deployments. AWS Graviton3 delivers roughly forty percent better price-performance than equivalent x86 instances on the same platform. Ampere Altra powers a growing share of OCI and Hetzner deployments at costs that make Graviton look expensive by comparison. Apple Silicon has demonstrated what a clean RISC implementation looks like when thermal constraints are treated as first-class design inputs.

The real question is not which architecture to use. It is where that compute should run and whether the operating model that cloud enforces is still appropriate for workloads that have become stable and predictable. Cloud solved a real problem when infrastructure was hard to operate, scaling required human intervention, and recovery from failure was slow and expensive. That world no longer exists in the same form. Automation, immutable infrastructure patterns, and rebuild-first recovery have closed much of the operational gap, yet cloud pricing still assumes that complexity and risk are high enough to justify a continuous premium. Most organisations have never explicitly challenged that assumption because no single renewal event makes the cost visible enough to force the question.

3. Architecture is now a power problem, not a performance problem

The difference between ARM and x86 is usually explained in terms of instruction set philosophy, with x86’s CISC heritage carrying decades of accumulated complexity that modern silicon must route around, and ARM’s RISC lineage enabling simpler decode pipelines and denser core layouts. At small scale, this reads like an engineering preference. At large scale, it determines how much compute you can physically deploy before you hit facility power and cooling limits.

Dimension	x86 (CISC)	ARM (RISC)	Implication
Typical CPU TDP	~250–350W	~100–180W	ARM fits more compute per power circuit
Performance per watt	Baseline	+40% to +60%	ARM does more work for the same electricity spend
Core density per rack	Lower	+30% to +80% higher	ARM scales out without expanding physical footprint
Thermal profile	Higher cooling load	Lower cooling load	ARM reduces the PUE multiplier on every watt consumed

The so-what is not merely that ARM is cheaper to run. It is that data centre growth is already constrained by power availability in multiple regions, and AI workloads are accelerating that pressure faster than new capacity can be commissioned. In that environment, architectures that waste power are not just inefficient, they are physically unsustainable beyond a certain density threshold. Organisations that have not thought about this as a power problem are going to rediscover it as a capacity problem within the next planning cycle.

4. The convergence that breaks the default

Four trends have aligned in a way that individually looks incremental but together breaks the default assumption that cloud is the rational place to run everything.

ARM has reached price-performance parity with x86 at every tier from edge nodes to hyperscale. NVMe has eliminated storage as a bottleneck by delivering high IOPS locally at commodity cost, removing the architectural reason that compute and storage were decoupled in the first place. Cloudflare and equivalent edge platforms have absorbed global ingress, TLS termination, DDoS mitigation, and WAF into a globally distributed layer that sits in front of infrastructure regardless of where it runs. And infrastructure automation tooling, most of it open source, has reduced the operational burden of managing bare metal or colocation to a level that no longer requires hyperscaler abstraction to be viable.

The implication is that cloud is no longer the simplest and most rational choice by default. It is one option among several, and that shift forces a much more explicit decision about where each workload belongs. Organisations that have not yet done that classification are making implicit choices rather than deliberate ones, and implicit infrastructure choices tend to become expensive ones.

5. The baseline scenario that exposes the mismatch

Modelling a steady-state system honestly exposes a pricing difference that is structural rather than marginal, because it is driven by assumptions built into the cloud model rather than by actual usage patterns. The following comparison uses a workload equivalent to one thousand continuously running servers with moderate latency sensitivity and storage-heavy characteristics. Both sides are loaded with full operational cost including labour, tooling, and object storage retention.

Component	Cloud Model (AWS-style)	ARM Colocation Model	Delta
Compute	$1.8M – $2.2M / month	$200K – $300K / month	-85% to -90%
Local storage	$300K – $600K / month	Included in hardware lease	-100% marginal cost
Snapshots and backups	$100K – $250K / month	$50K – $100K / month	-50% to -70%
Network egress	$400K – $900K / month	$100K – $250K / month	-60% to -80%
Operations and managed services	$200K – $500K / month	$50K – $150K / month	-60% to -75%
Total monthly	$2.7M – $4.2M	$400K – $800K	-75% to -85%

The reason the delta is this large is not that cloud is overcharging for what it provides. It is that cloud charges for elasticity, multi-tenancy abstraction, and operational indirection whether those properties are used or not. For a workload that runs at predictable utilisation and never needs to scale unexpectedly, every dollar spent on those properties is a cost with no corresponding benefit. At this scale, the annual difference is between thirty-two million and fifty million dollars. That is not a cost-saving exercise. That is a strategic decision about whether to fund an engineering capability or continue renting one indefinitely.

The hardware anchor for this model is Ampere Altra-based servers, which are available from Hetzner, OVHcloud, and Oracle Cloud Infrastructure at price points that make the compute column above achievable without heroic procurement.

6. Storage is where the model collapses first

Storage economics expose the mismatch more clearly than any other component because cloud storage pricing treats performance as a premium to be purchased incrementally rather than a property of the hardware. EBS GP3 costs roughly $0.08 per GB-month for baseline performance, with additional per-IOPS charges beyond three thousand IOPS and per-throughput charges beyond 125 MB/s. NVMe storage in commodity servers costs approximately $0.03 to $0.05 per GB-month with no performance tiers, because the performance characteristics are determined by the device and not by a billing dimension.

Feature	Cloud EBS (GP3)	Local NVMe	Implication
Baseline IOPS	3,000 (provisioned to 90K at cost)	~500K–1M+ queue depth 1	NVMe outperforms provisioned EBS at zero marginal cost
Sequential throughput	~1,000 MB/s	~3,000 – 7,000 MB/s	NVMe removes the throughput ceiling that shapes application design
Latency	1–5 ms (network-attached)	50–100 µs (device local)	A 99th percentile improvement of roughly 95%, not a mean improvement
Cost per GB-month	$0.08 – $0.20	$0.03 – $0.05	60% to 80% cheaper before the performance premium is added

The so-what is that entire architectural patterns have been constructed specifically to work around cloud storage latency. Read-through caches, write buffers, async queuing architectures, and tiered storage designs exist not because they represent the best solution but because network-attached storage made them necessary. When latency drops from milliseconds to microseconds, many of those patterns become unnecessary complexity. Removing them simplifies the system, reduces the operational surface, and eliminates failure modes that exist solely because of the storage model. Applications that were designed around EBS constraints often run faster and more simply on NVMe without any code changes beyond configuration.

7. S3 durability and why that number matters

Amazon S3’s eleven nines of durability, which is 99.999999999%, is not marketing. It is an architecture achievement that represents roughly one object lost per hundred million objects stored per million years, and it was achieved through aggressive erasure coding, cross-facility replication, and sustained investment in data integrity verification that spans the entire AWS infrastructure footprint. This number is genuinely difficult to replicate without that scale of investment.

This matters because the argument for moving compute away from cloud does not extend equally to object storage. S3, or an equivalent service like Cloudflare R2, remains the appropriate durability layer for almost every edge architecture because it provides a property that is extremely hard to reproduce on-premises and not worth trying to reproduce when the cost is already low. R2 in particular eliminates egress charges entirely, which changes the economics of using object storage as an integration and archival layer between systems.

The implication is that a realistic ARM edge architecture does not eliminate cloud object storage, it promotes it to the role that cloud compute used to play. S3 becomes the centre of gravity for durability, recovery, and cross-system integration while compute and primary storage move to hardware that can be owned and operated. This is a fundamentally different model from either full cloud or full on-premises. It is selective cloud, where you pay for the primitives that deliver genuine, hard-to-replicate value and stop paying for the ones that are merely convenient.

The practical consequence is that backup and recovery design becomes simpler. Compute is ephemeral and rebuilds from S3. Local NVMe holds working state and checkpoints asynchronously to object storage. Failure recovery becomes a rebuild problem rather than a replication problem, and the eleven nines durability guarantee covers the layer that matters most.

8. The open source stack that fills the gap

The primary concern when evaluating a move away from managed cloud services is not cost or performance but the loss of the operational abstractions that cloud provides. The gap is real, but it is largely closed by a mature and widely deployed set of open source tools. The following is not a theoretical wishlist. These are production-grade projects with large deployment bases.

Compute orchestration. Kubernetes closes the gap between managed EKS and self-managed compute. Talos Linux provides an immutable, API-driven operating system designed for Kubernetes nodes that eliminates configuration drift and SSH access as a failure mode. K3s is appropriate for smaller deployments where the full Kubernetes control plane is disproportionate.

Storage. Longhorn provides distributed block storage with snapshot and backup integration for Kubernetes workloads, with S3-compatible backup targets. Rook with Ceph provides production-grade distributed storage for larger deployments. For direct NVMe access patterns, no orchestration layer is required and local storage performs significantly better without one.

Networking and ingress. Cilium replaces kube-proxy with eBPF-based networking that provides network policy, observability, and service mesh capabilities without a sidecar model. Cloudflare Tunnel replaces inbound firewall rules and public IP exposure entirely, which changes the security posture of edge-deployed infrastructure from inside-out to outside-in.

Secrets and identity. HashiCorp Vault, now under the BSL licence with an open source fork available as OpenBao, provides secrets management, dynamic credentials, and PKI. External Secrets Operator integrates Vault and other backends with Kubernetes natively.

Observability. The Prometheus and Grafana stack provides metrics, alerting, and dashboards. OpenTelemetry provides vendor-neutral instrumentation. Loki handles log aggregation. Together they provide observability capability equivalent to CloudWatch at a fraction of the cost and with significantly better retention economics.

GitOps and deployment. ArgoCD or Flux provide continuous deployment with audit trails, rollback, and drift detection. Combined with immutable infrastructure patterns, they replace the operational need for console access to production systems.

Database. Managed RDS is the hardest cloud service to replace cleanly. PostgreSQL with Patroni provides high availability with automatic failover. PlanetScale’s Vitess handles sharding for MySQL workloads. CloudNativePG is a Kubernetes operator for PostgreSQL that handles replication, backup to S3, and failover with significantly less operational overhead than managing Patroni directly.

The so-what is that the operational gap between cloud-managed services and self-operated infrastructure has narrowed to a matter of engineering discipline rather than tooling availability. None of these projects are experimental. All of them run production workloads at scale. The barrier is not capability, it is the willingness to own and develop that operational expertise rather than delegating it to a hyperscaler margin.

9. Cyber shifts from infrastructure to edge

Security in a cloud-native architecture tends to be implemented as a set of controls layered inside the cloud provider’s network boundary. VPCs, security groups, NACLs, WAF rules, and shield configurations all assume that traffic reaches AWS or GCP infrastructure before it is filtered. This model works, but it means your attack surface is anything that reaches your cloud perimeter, and your provider’s infrastructure is inside that perimeter.

Cloudflare’s model inverts this. Traffic is processed at the edge, which is distributed across over three hundred network locations globally before it reaches origin infrastructure. DDoS mitigation, WAF, bot management, and zero trust access controls operate at that layer before a single packet reaches a server. This is not a minor architectural improvement. It means that a large class of attacks that would require expensive cloud-side infrastructure to absorb simply never reach origin.

Capability	Cloud-native model	Cloudflare edge model	Practical consequence
DDoS mitigation	Regional, capacity-based	Global, scrubbing at PoP	Attack traffic absorbed closer to source
WAF	Managed add-on with per-request cost	Included, pre-origin	No compute consumed processing attack traffic
Zero trust access	VPN or complex identity provider integration	Cloudflare Access	Browser-delivered, no client required
Origin IP exposure	Public by necessity or complex VPC design	Tunnel eliminates public origin entirely	Origin unreachable without Cloudflare authentication

The implication is that moving compute to colocation does not weaken the security posture if edge controls are in place. For many organisations it strengthens it, because the attack surface is reduced by removing public IP exposure and concentrating filtering at a layer that is physically closer to attackers and has significantly more traffic inspection capacity than any single cloud region.

10. Cloud primitives do not disappear, they find their correct roles

A mature ARM edge architecture does not eliminate cloud. It finds the minimum set of cloud services that deliver genuine, non-replicable value and treats everything else as a workload classification decision. The services that survive this test are almost always the ones that require global scale, extreme durability guarantees, or integration with external systems that already depend on them.

Cloud service	Edge equivalent	Transition model
EC2 compute	Ampere Altra bare metal or Graviton at smaller scale	Migrate steady-state workloads, retain burst in cloud
EBS block storage	Local NVMe on compute nodes	Eliminate for primary storage, retain snapshots to S3
S3 object storage	Cloudflare R2 for egress-sensitive workloads, S3 retained for eleven nines durability	S3 becomes durability anchor, R2 becomes delivery layer
RDS managed database	CloudNativePG on Kubernetes with S3 backup	Requires operational investment but eliminates per-instance premium
Load balancing	Cloudflare routing with origin pools	Cloudflare provides global routing with health checks natively
Auto scaling	Node pools with predictive provisioning	Steady-state workloads do not need reactive scaling

The pattern that emerges is that the services worth retaining tend to be data services rather than compute services. Compute economics change dramatically when you own the hardware. Data durability economics change only when you can match eleven nines at comparable cost, and most organisations cannot.

11. Resilience becomes a rebuild problem, not a replication problem

Cloud has normalised the idea that replication equals safety. Multi-AZ deployments, read replicas, cross-region replication, and synchronous failover are all sold as resilience features and they do protect against infrastructure failure. But replication does little to prevent logical failure, and it often amplifies it by distributing corrupted state across every replica before the corruption is detected. The types of failures that dominate modern systems are increasingly driven by software defects, misconfiguration, and data corruption, and replication makes those failures harder to recover from rather than easier.

A rebuild-first model changes the objective from preserving running state to restoring correct state. Any node can be destroyed and recreated from a known good baseline stored in object storage. Recovery is exercised regularly as a standard operational procedure rather than tested once during a DR drill and forgotten. Failure is treated as an expected condition that the system must handle gracefully rather than an exceptional event that requires manual intervention.

The implication is that resilience becomes cheaper, more testable, and better aligned with the failure modes that actually occur. It also eliminates a class of hidden coupling between infrastructure and state that makes cloud architectures fragile in ways that are not visible until they fail. A large portion of replication cost in cloud environments turns out to be protecting against a narrower class of failures than most teams assume, and rebuilding often recovers more cleanly from the failures that actually matter.

12. The operating model is the real disruption

The technical components of this shift are available, mature, and well-documented. The open source stack exists. The hardware is commercially available. The edge security model works. The barrier is not technical capability, it is the operating model that a rebuild-first, edge-oriented architecture requires.

Cloud abstracted infrastructure complexity at a time when that complexity genuinely could not be managed by most engineering teams. The trade was reasonable. In exchange for paying a significant ongoing premium, organisations received freedom from hardware procurement cycles, immediate access to global infrastructure, and managed services that allowed small teams to run large systems without deep operational expertise. The trade made sense when the alternative was a data centre with a six-month lead time and a team of network engineers.

That alternative no longer looks like that. Infrastructure automation has reduced the operational burden substantially. The open source tooling described above provides most of the capability that managed services provide, with more configuration flexibility and without the per-unit pricing that compounds at scale. Colocation in modern facilities is available at costs that have not risen as sharply as cloud pricing has, particularly for ARM-compatible hardware.

What has not changed is that running this model requires stronger engineering discipline, clearer ownership of operational processes, and a team that understands how systems behave under failure conditions. Cloud teams optimise deployments. Edge teams design for failure and verify that their recovery procedures work. The distinction sounds minor but it produces different architectures, different on-call cultures, and different cost trajectories over time.

The so-what is that the barrier to leaving cloud is no longer technical but organisational. The organisations that build this operational capability will have a structural cost advantage that compounds over time. The organisations that do not will continue optimising inside a model that was designed for a different era of infrastructure economics, and the gap will widen until a competitive event forces the question at a time of their competitor’s choosing rather than their own.

13. Where cloud still wins, and why that boundary matters

Cloud remains the correct choice for workloads that require burst elasticity at timescales that cannot be predicted or provisioned in advance, for global real-time coordination across many geographic regions simultaneously, and for deep integration with managed AI services where the capital cost of running equivalent infrastructure would be prohibitive. These scenarios are real and they represent a meaningful subset of enterprise workloads.

The mistake is not using cloud. It is using it by default for workloads that exhibit none of those characteristics, because the default was established at a time when the alternative was not viable and has never been explicitly reconsidered. Once that distinction is drawn clearly, cloud becomes a targeted tool deployed where its strengths are genuinely needed rather than a universal solution that every workload inherits without question.

The boundary also shifts over time. As ARM hardware becomes more widely available, as the open source stack matures further, and as engineering teams develop operational capability, the category of workloads for which cloud is the correct answer narrows. That trajectory has one direction.

14. The final provocation

If a workload is predictable, long-running, and not dependent on burst elasticity or global real-time coordination, then running it in an environment designed for those properties is an explicit subsidy to your cloud provider, not a rational infrastructure decision. The combination of ARM compute, local NVMe storage, edge-based security, S3-anchored durability, and a mature open source operational stack has made a large proportion of steady-state compute contestable in a way that it was not five years ago.

The organisations that recognise this shift early do not just reduce cost, though the cost reduction at scale is large enough to be strategically significant. They simplify their architectures by removing layers of abstraction that exist to compensate for cloud infrastructure constraints. They regain visibility into how their systems actually behave. And they develop engineering capability that is durable rather than dependent on a vendor’s ongoing pricing decisions.

Cloud does not disappear. It compresses into the roles where its properties are genuinely superior: burst compute, extreme durability via S3, and global coordination infrastructure. Everything else becomes a decision that should be made explicitly, with current economics, against the actual behaviour of the workload. The organisations that have not yet had that conversation are not avoiding risk. They are accumulating it.

References

Hardware and compute

AWS Graviton processor overview — official AWS page covering Graviton generations and the 40% price-performance claim versus x86.
AWS Graviton getting started guide — migration framework and benchmark methodology from AWS.
Ampere Altra processor — Ampere’s product page for the Altra family used in Hetzner, OCI, and PhoenixNAP deployments.
Hetzner Ampere Altra dedicated servers launch — announcement of the RX line, the first European ARM dedicated server offering.
Hetzner CAX ARM cloud servers — launch of Hetzner’s Ampere Altra cloud instances with NVMe-backed storage.
AWS Graviton Wikipedia entry — covers all five generations including the Graviton5 announcement in December 2025.

Storage

Amazon S3 storage classes — official S3 page confirming the 99.999999999% (eleven nines) durability guarantee across storage classes.
Amazon S3 product page — S3 overview confirming eleven nines durability, 500 trillion objects stored as of 2026, and the 2006 launch history.
How Amazon S3 stores 350 trillion objects with 11 nines of durability — ByteByteGo deep dive into S3’s architecture including erasure coding, cross-AZ replication, and index integrity mechanisms.
Cloudflare R2 overview — R2 documentation home, covering the S3-compatible API and zero egress fee model.
Cloudflare R2 pricing — confirms zero egress charges for all R2 storage classes and provides billing examples.

Open source tooling

Talos Linux — immutable, API-driven operating system designed for Kubernetes; eliminates SSH access and configuration drift.
K3s — lightweight certified Kubernetes distribution suitable for edge and resource-constrained environments.
Longhorn — CNCF distributed block storage for Kubernetes with S3-compatible backup targets.
Rook — production-grade Ceph storage orchestration for Kubernetes.
Cilium — eBPF-based networking, observability, and security for Kubernetes; replaces kube-proxy.
Cloudflare Tunnel — eliminates inbound firewall rules and public IP exposure by establishing outbound-only connections to Cloudflare’s edge.
OpenBao — open source fork of HashiCorp Vault under the MPL licence, providing secrets management and PKI.
External Secrets Operator — Kubernetes operator that syncs secrets from Vault, AWS Secrets Manager, and other backends.
CloudNativePG — Kubernetes operator for PostgreSQL providing replication, S3 backup, and automatic failover.
Patroni — high availability template for PostgreSQL with automatic failover using etcd, Consul, or ZooKeeper.
ArgoCD — declarative GitOps continuous delivery for Kubernetes with drift detection and rollback.
OpenTelemetry — vendor-neutral observability instrumentation framework covering traces, metrics, and logs.
Prometheus — widely deployed metrics collection and alerting system.
Grafana — observability and visualisation platform used with Prometheus, Loki, and OpenTelemetry.

Security and edge

Cloudflare DDoS protection — overview of Cloudflare’s global scrubbing network and pre-origin traffic filtering.
Cloudflare Access (Zero Trust) — browser-delivered zero trust access control without a VPN client.

Background reading

Graviton price-performance benchmarking — Hykell — independent analysis of Graviton versus x86 pricing across instance families.
nOps Graviton cost savings analysis — covers RDS Graviton migration and compounding savings from architectural shifts.