Amazon EVS Migration: Cut Costs by 46% Today

March 24, 2026 Blog 15 min read

Amazon EVS delivers up to 46% cost savings by shifting lifecycle management to the customer within a consolidated domain. Unlike the managed VMC model, Amazon EVS forces organizations to own patching and route propagation, a trade-off that demands rigorous architectural planning to avoid connectivity failures.

Readers will dissect the specific mechanics of BGP peering required to bridge T0 gateways with AWS Transit Gateway, ensuring smooth route propagation across hybrid environments. We will also analyze how AI-driven API demand, projected to surge 30% by 2027 according to AWS migration data, necessitates the elastic scaling capabilities inherent in this bare-metal EC2 deployment model. Finally, the discussion covers strategic VPC peering patterns that maintain micro-segmentation policies while integrating with native Amazon VPC route tables.

The shift from vendor-managed SDDC to customer-controlled consolidated domains represents a fundamental change in operational responsibility for vSphere and vSAN clusters. By using i4i. Metal instances directly in your subnets, you gain granular network control but lose the abstraction layer previously provided by VMware. Understanding these two-layer networking dynamics is critical for teams attempting to lift and shift workloads without refactoring applications or retraining staff on entirely new paradigms.

The Role of NSX Overlay and AWS Underlay in EVS Architecture

Amazon EVS Two-Layer Networking: VPC Underlay and NSX Overlay

Amazon EVS separates physical VPC infrastructure from logical NSX overlay segments to isolate management traffic from workload data planes. The underlay layer comprises subnets, route tables, and ENI-attached ESXi hosts that handle vSphere and vSAN communication via the main VPC route table. Customer workloads execute exclusively on overlay segments. This architecture operates as a consolidated domain where management components and tenant applications share the same cluster resources rather than residing in segregated environments.

Layer Component	Function	Traffic Type
Underlay VPC	Host connectivity	Management, vSAN
NSX Overlay	Logical switching	Workload data
T0 Gateway	Route advertisement	BGP peering

The reliance on Interface VPC Endpoints creates a hard constraint, as S3 Gateway Endpoints remain unsupported for this traffic pattern. Network reconfiguration complexity increases when migration plans fail to account for the strict separation between underlay VPC networking and the abstracted overlay CIDRs. Operators must plan for distinct routing behaviors since the underlay handles host survival while the overlay manages tenant policy enforcement. This split dictates that any failure in the physical ENI path immediately disrupts the logical control plane, creating a single point of dependency despite the software-set abstraction. ### Deploying i4i. Metal and.

Amazon EC2 bare-metal instances like i4i. Metal and i7i. Metal-24xl form the physical substrate for vSAN clusters within Amazon EVS. These Amazon EC2 bare-metal instances provide direct hardware access, eliminating the hypervisor overhead typical of virtualized cloud compute layers. The i4i. Metal variant supplies 30 TiB of local NVMe storage per host, which VMware vSAN aggregates into a distributed datastore without external block dependencies. This local NVMe storage architecture removes separate storage billing lines, fundamentally altering the total cost of ownership model compared to traditional SAN attachments. Operators must select between the high-capacity i4i. Metal or the newer i7i. Metal-24xl based on core-count requirements rather than storage needs, as both apply identical disk configurations. Unlike on-premises VMware deployments requiring distinct network fabrics for storage and compute, the AWS underlay converges these traffic types onto shared Elastic Network Interfaces. This convergence introduces a single point of contention where storage latency correlates directly with underlay network congestion. The absence of external Fibre Channel switches simplifies the bill of materials but increases reliance on consistent VPC routing performance. Failure to account for this dependency risks cascading storage timeouts during VPC route table updates.

Validating Service Access Subnets and SDDC Manager Connectivity

Correct underlay operation requires a dedicated service access subnet hosting management Elastic Network Interfaces. This isolated segment enables control-plane connectivity between AWS infrastructure and VCF appliances, preventing route conflicts with workload traffic. Operators must verify that Interface VPC Endpoints handle all service traffic, as S3 Gateway Endpoints remain unsupported as of March 2026. The SDDC Manager and vCenter instances reside strictly within the first SDDC cluster to orchestrate the management domain. This placement mandates that no external monitoring agents occupy these reserved subnets.

Validation steps for the management plane include:

Confirming ENI placement exclusively within the assigned service access CIDR block.
Verifying T0 Gateway eBGP sessions establish correctly with VPC Route Server endpoints.
Ensuring NSX Managers form a quorum before promoting workload domain configurations.
Testing DNS resolution for vSphere components without relying on public resolvers.

The limitation of this architecture is strict: any deviation in subnet allocation causes immediate deployment failure because the automation framework cannot reconcile overlapping CIDRs.

Active/Standby NSX Edge T0 Peering with Dual VPC Route Servers

Two NSX Edge T0 nodes establish distinct eBGP sessions to separate Route Server Endpoints This topology mandates an Active/Standby configuration where the primary edge handles all north-south traffic while the secondary remains idle until failure detection triggers a failover event. Each edge node pairs exclusively with one of the two required VPC Route Server endpoints, preventing asymmetric routing paths that could alter stateful inspection engines.

The peering process follows a strict sequence to maintain overlay integrity:

The active T0 gateway advertises overlay CIDRs to its assigned route server peer.
The standby node suppresses route advertisements to avoid blackholing traffic during normal operations.
Underlay ENIs enable the TCP connection for BGP keepalives between edge and route server.

Mode	Route Advertisement	Traffic Flow
Active	Propagates prefixes	Handles all ingress/egress
Standby	Suppresses prefixes	Drops traffic until promotion

Operators must account for the lack of Multi-hop BFD support, relying instead on standard BGP timers for liveness detection which extends convergence time compared to optimized on-premises designs. This constraint forces a trade-off between simplicity and speed, as quicker failover requires tuning keepalive intervals at the risk of false positives during transient network jitter. The architecture abstracts complex routing between the EVS environment

Packet fragmentation occurs immediately when NSX transport MTU exceeds the 8500-byte ceiling enforced by Transit Gateway Current generation EC2 instances support frames up to 9001 bytes, yet the underlay path often traverses components with stricter limits. Operators must cap the overlay MTU at 8500 bytes to accommodate Direct Connect Transit VIF constraints without triggering IP fragmentation or ICMP blackholing.

Component	Maximum Frame Size	Constraint Source
EC2 Bare-Metal	9001 bytes	Instance Hardware
Transit Gateway	8500 bytes	AWS Service Limit
Direct Connect VIF	8500 bytes	Physical Circuit

Misalignment here silently degrades throughput for large vSAN replication streams and backup jobs. The cost of ignoring this limit is measurable packet loss rather than immediate session failure. Configuration requires explicit adjustment within the NSX Manager transport zone profiles before workload deployment.

Calculate total header overhead for VXLAN encapsulation plus outer IP headers.
Set the physical MTU on underlay interfaces to 9001 bytes where supported.
Configure the NSX tunnel endpoint MTU to 8500 bytes to match the lowest common denominator.
Validate end-to-end connectivity using large-ping tests across the Route Server Endpoints

Overlapping IP spaces in overlay networks compound this issue by preventing standard traceroute diagnostics from identifying the specific hop dropping oversized frames. The limitation forces a strict hierarchy where the most restrictive network segment dictates the global MTU policy for the entire SDDC.

Blind Spots in VPC Flow Logs for NSX Overlay Traffic Monitoring

VPC Flow Logs capture only underlay ENI traffic, leaving NSX overlay segments completely invisible to standard AWS monitoring tools. Customer workloads run exclusively on these overlay segments Operators relying solely on AWS native telemetry face a total blind spot during troubleshooting events. The NSX Edge T0 gateway can become a throughput bottleneck if undersized, yet this performance degradation remains undetectable without VMware tools like Aria Operations or Traceflow.

Monitoring Scope	Visible Data	Blind Spot
VPC Flow Logs	Underlay ENI flows	Overlay segment traffic
NSX Traceflow	Logical path traces	Physical host metrics
Aria Operations	full-stack visibility	None

This architectural separation forces a dual-tool operational model where AWS logs confirm host reachability while VMware utilities diagnose application connectivity. Security teams inserting AWS Network Firewall Failure to deploy dedicated VMware monitoring results in extended mean-time-to-resolution for overlay-specific failures.

Transit Gateway Mandate for Amazon EVS Overlay Segments

VPC peering remains unsupported, forcing all external connectivity for an Amazon EVS VPC through Transit Gateway or AWS Cloud WAN. This architectural constraint eliminates direct peering options and prohibits Private VIFs or VGW based site-to-site VPNs in the underlay. Operators must configure static routes in the Transit Gateway route table for every NSX overlay CIDR range since automatic import fails. The dependency on centralized hubs introduces a single point of policy enforcement that simplifies governance but increases configuration overhead.

Dimension	Transit Gateway	AWS Cloud WAN
Route Propagation	Manual static entries required	Flexible via core network policy
Regional Scope	Single region per gateway	Global network fabric
Policy Management	Attachment-based rules	Intent-driven segments

Traffic destined for Amazon S3 must traverse Interface VPC Endpoints This requirement forces all storage traffic through the T0 gateway, adding a hop that consumes throughput capacity. Centralized inspection via AWS Network Firewall The lack of VPC peering support isolates the Amazon EVS environment, preventing lateral movement without traversing the hub. Shared services belong in adjacent VPCs connected via the hub to avoid route table sprawl.

Implementing Dedicated Amazon EVS VPCs for Blast-Radius Isolation

Keeping the Amazon EVS VPC dedicated exclusively to Amazon EVS resources prevents shared-tenant noise from obscuring forensic analysis during security incidents. This topology forces all north-south traffic through a centralized inspection point, enabling AWS Network Firewall Operators gain clear cost allocation lines, yet lose the low-latency direct path that VPC peering would otherwise provide if supported.

The architectural constraint creates a specific trade-off between operational simplicity and routing flexibility. Transit Gateway requires manual static route entries for every NSX overlay CIDR, whereas AWS Cloud WAN allows flexible policy propagation across AWS Regions. Choosing the wrong hub type delays convergence during failover events by minutes rather than seconds.

Dimension	Dedicated EVS VPC	Shared VPC Model
Blast Radius	Contained to single VPC	Spills into adjacent subnets
Audit Scope	Centralized at TGW	Fragmented across ENIs
Route Mgmt	Static entries per CIDR	Automatic local propagation
Cost Visibility	Fully isolated billing	Blended infrastructure charges

Management traffic relies on Elastic Network Interfaces placed in a specific service access subnet, which must remain unreachable from non-VCF workloads to satisfy compliance audits. S3 access further complicates this design because S3 Gateway Endpoints Andrew Haigh of DXC Technology endorses this isolated approach for combining trusted platforms with cloud-native services without compromising security posture. The limitation is measurable: every new overlay segment requires a corresponding static route update in the central hub table.

Connectivity Failure Modes: Unsupported Private VIFs and VGW VPNs

Direct attachment of Private VIFs or VGW based VPNs to the Amazon EVS underlay VPC triggers immediate connectivity loss because the service architecture forbids these terminations. Operators attempting to bypass Transit Gateway for on-premises integration face a hard stop, as Amazon EVS lacks the native gateways required to process Direct Connect This constraint forces all hybrid traffic through a centralized hub, eliminating direct peering shortcuts common in legacy VMware on-premises designs.

Connection Type	Supported on EVS VPC	Required Termination Point
Private VIF	No	Transit Gateway
VGW Site-to-Site VPN	No	Transit Gateway
Transit VIF	Yes	Transit Gateway
AWS Cloud WAN	Yes	Core Network Policy

The operational penalty involves manual route management; users must insert static entries in the Transit Gateway route table for every NSX overlay CIDR since automatic propagation fails. A secondary failure mode emerges when architects assume standard storage access patterns apply; S3 Gateway Endpoints remain unsupported as of March 2026, mandating traffic redirection through Interface VPC Endpoints This requirement increases hop count and introduces potential latency if the inspection path is undersized.

Implementing Strong CIDR Planning and DNS Configuration for EVS

Non-Overlapping CIDR Blocks and the /24 Underlay Minimum

Dashboard showing 90% of deployment issues prevented by planning, 34% migration time reduction, 30% effectiveness boost, 46% cost savings potential, and CIDR constraints limiting subnets to 112 hosts.

Allocate a minimum /24 CIDR block per VPC to satisfy the strict underlay infrastructure requirements for Amazon EVS.

Reserve dedicated address space where management, vSAN, and NSX interfaces operate without overlap from workload segments.
Verify that all overlay CIDR ranges remain distinct from the primary VPC block and on-premises networks to prevent silent traffic blackholing.
Deploy customer workloads exclusively on overlay segments

Host-vtep networks consume two addresses per host, which limits a single /24 subnet to 112 hosts maximum. Misconfigurations during this allocation phase cause frequent deployment failures. Expanding these blocks after deployment demands complex re-architecting because the building blocks possess architectural rigidity. AWS documentation from re:Invent 2024 details these constraints. Connectivity to AWS services like S3 demands Interface Endpoints This constraint forces all service traffic through the ENI fabric. Latency increases slightly while policy enforcement remains intact. Overlapping subnets within NSX segments can isolate test workloads. The underlay must stay pristine. Failure to segregate infrastructure traffic from customer data creates forensic blind spots during incident response.

Using the AWS Spreadsheet Tool to Validate CIDR and DNS Feasibility

Execute the AWS spreadsheet tool to validate account, VPC, CIDR, and DNS feasibility before provisioning any resources.

Input proposed Amazon EVS subnet ranges to verify non-overlap with existing on-premises networks and VPC primary blocks.
Confirm DNS forwarder configurations align with Route 53 Resolver endpoints to prevent resolution failures during SDDC deployment.
Cross-reference host-vtep consumption rates against the /24 maximum to ensure capacity for planned bare-metal instance counts.

Skipping this validation step invites silent traffic blackholing where BGP advertisements propagate but data planes drop packets. Proper upfront planning removes 90% of Amazon EVS networking deployment issues by catching these conflicts early. Operational payoffs for rigorous pre-checks appear in measurable metrics. Organizations using such automation frameworks saw a 34% reduction in total migration time. Infrastructure teams focus on architectural optimization rather than troubleshooting preventable connectivity errors. Speed of deployment conflicts with the rigidity of CIDR immutability once the environment launches. Changing an overlay segment post-deployment requires tearing down and rebuilding the entire management domain. Teams ignoring these constraints faced significant rework. Disciplined validation boosted infrastructure team effectiveness by 30%. The cost of a single overlapping subnet definition exceeds the minutes spent validating the sheet.

Preventing Workload Contamination in Amazon EVS Management Subnets

Launching bastion hosts or monitoring agents in Amazon EVS management subnets triggers immediate deployment failure due to strict isolation requirements.

Reserve specific address blocks exclusively for VCF management components, ensuring no customer workloads occupy these ranges.
Segregate vSAN and NSX interface traffic into dedicated subnets distinct from general compute networks.
Validate that all overlay segments remain logically separated from the underlay infrastructure hosting SDDC Manager.

The consolidated architecture model forces VCF appliances and customer VMs to share the same physical domain. Subnet boundaries blurring creates a single point of configuration failure. Operators often mistake the shared service access subnet for a general-purpose network zone. This specific segment handles critical control-plane handshakes between AWS and VMware layers. Placing a monitoring agent here consumes IP addresses required for host-vtep mapping. The cluster potentially starves of necessary tunnel endpoints. InterLIR advises treating these subnets as read-only infrastructure zones. Any unauthorized ENI attachment halts the entire orchestration workflow. Operational flexibility decreases due to this rigidity. Guaranteed stability for the management plane serves as the benefit.

About

Alexander Timokhin, CEO of InterLIR, brings critical infrastructure expertise to the discussion of Amazon Elastic VMware Service. While his company specializes in optimizing IPv4 address availability, his deep background in IT infrastructure and network architecture makes him uniquely qualified to address the networking complexities of migrating VMware workloads to AWS. As organizations "lift and shift" environments using Amazon EVS, they often face significant challenges regarding IP allocation and subnet design within an Amazon VPC. Timokhin's daily work solving network availability problems directly correlates with ensuring these modern hybrid clouds have sufficient, clean IP resources to function correctly. By connecting InterLIR's mission of transparent IP redistribution with the technical demands of VMware Cloud Foundation, he provides a strategic perspective on building reliable, scalable networks that support smooth cloud transformation without refactoring applications.

Conclusion

Amazon EVS demands a shift from reactive troubleshooting to predictive network governance because the platform's rigid CIDR immutability means post-deployment fixes require total environment reconstruction. As organizations scale beyond pilot phases, the operational debt of overlapping subnets or misplaced management agents compounds rapidly, turning minor configuration drifts into catastrophic orchestration halts. The shared physical domain creates a fragile dependency where unauthorized ENI attachments in service access subnets starve host-vtep mappings, effectively freezing cluster expansion. Teams must accept that operational flexibility decreases to guarantee management plane stability; treating these zones as read-only infrastructure is not a suggestion but a survival requirement for production workloads.

Adopt a strict "infrastructure-as-code validation" mandate for all EVS deployments by Q3, requiring automated subnet reservation checks before any SDDC Manager initialization. Do not rely on manual spreadsheets or periodic audits, as the cost of a single collision exceeds the entire migration budget. Start by auditing your current IP address management (IPAM) logic this week to identify any non-VCF components residing within reserved management blocks, and immediately migrate them to dedicated compute segments before launching new clusters. This proactive segregation prevents the silent accumulation of technical debt that inevitably triggers deployment failures during critical scaling events.

Frequently Asked Questions

How much can Amazon EVS reduce total infrastructure costs compared to legacy models?

Amazon EVS delivers up to 46% cost savings by shifting lifecycle management responsibilities directly to the customer. This financial benefit arises from operating within a consolidated domain that eliminates separate storage billing lines.

What percentage of deployment issues are prevented by proper upfront CIDR and routing planning?

Proper upfront planning removes 90% of Amazon EVS networking deployment issues related to overlapping subnets and route propagation. Neglecting this step often leads to immediate disruptions in the logical control plane and physical ENI paths.

How does leveraging existing VMware licenses impact the upfront software expenditure for new deployments?

Customers applying existing VMware subscriptions to Amazon EVS see a 30% increase in infrastructure team effectiveness during migration. This portability allows organizations to significantly reduce upfront software costs while maintaining familiar operational tools.

What specific instance types does Amazon EVS support for optimizing vSAN storage performance?

The service supports i7i.metal-24xl instances which offer newer generation processors designed for specific cost-performance benefits. These bare-metal options provide direct hardware access and aggregate local NVMe storage without external block dependencies.

How does automation framework adoption affect the total time required for large-scale migrations?

Organizations utilizing automation frameworks saw a 34% reduction in total migration time when deploying Amazon EVS clusters. This efficiency gain allows teams to lift and shift workloads faster without refactoring applications or retraining staff.

interlir

Alexander Timokhin