Amazon VMware migrations without refactoring code
With public cloud spending hitting 45% of enterprise IT in 2026, Amazon EVS delivers the fastest path to migrate VMware workloads without refactoring. Unlike previous iterations, this model addresses the critical friction of cloud networking architecture while preserving familiar operational tools.
Readers will learn how the service utilizes a distinct two-layer networking model that separates underlay VPC infrastructure from NSX overlay logic, effectively abstracting workload segments from physical host management. We examine the specific mechanics of BGP routing and how ESXi hosts launched as i4i. Metal instances integrate with customer subnets to maintain smooth hybrid operations.
Finally, the discussion covers essential CIDR planning strategies required to avoid IP conflicts between the management domain and workload clusters during deployment. By using data from AWS regarding the shift toward private AI deployments, we outline how organizations can mitigate data lock-in risks while modernizing their network perimeter. This approach ensures that migrating teams retain vSphere and vSAN APIs without sacrificing the scalability of the AWS backbone.
The Role of Amazon EVS in Modernizing VMware Cloud Infrastructure
Amazon EVS as an AWS-Managed VCF Automation Framework
Public cloud spending is projected to reach 45% of enterprise IT budgets by 2027, driving demand for lift-and-shift capabilities. VMware Cloud Foundation deploys directly inside Amazon Virtual Private Cloud through the AWS-managed automation framework. This system provisions vSphere, vSAN, and NSX components onto bare-metal instances within customer subnets. Separating underlay VPC routing from overlay NSX segments preserves existing operational tools. The architecture mandates dedicated /24 subnets where no other workloads may reside, creating rigid capacity planning constraints. Operators must allocate distinct CIDR blocks for management, vSAN, and overlay traffic before deployment begins. Misalignment in these non-overlapping requirements causes immediate provisioning failures. Network teams balance rapid migration speed against the inflexibility of fixed subnet sizes.
Deploying NSX Managers and ESXi Hosts in the EVS Overlay Layer
NSX Managers deploy inside the management domain to orchestrate logical networks. This overlay layer abstracts customer workloads from physical underlay constraints using T0/T1 gateways. Control plane signaling separates from data plane forwarding across distinct virtual segments. Administrators manage two parallel routing domains simultaneously, which increases operational complexity. Preserving familiar VMware APIs often means accepting reduced visibility into physical host placement decisions.
Specific VPC constructs enable the underlay infrastructure to function correctly. Https://docs. Aws. Amazon. Com/evs/latest/userguide/concepts. Html data shows the service provisions elastic network interfaces into a service access subnet for management connectivity. These interfaces link ESXi hosts running in customer-selected subnets to the broader VPC routing table. Traffic flows through default route tables rather than custom configurations per host. Misaligned security groups on these ENIs block necessary vSphere heartbeat traffic immediately after deployment.
Failures often stem from assuming overlay policies apply to underlay management traffic. Underlay routes do not inherit NSX firewall rules automatically. Operators must explicitly configure security groups for the management subnet to prevent isolation of the SDDC cluster. Precise CIDR planning avoids overlap between overlay segments and VPC ranges.
Navigating VPC Route Table Limits and Security Group Enforcement Gaps
A /24 subnet supports exactly 112 hosts due to address consumption constraints. This hard ceiling defines the maximum scale for any single VPC route table segment within the deployment. The mechanism reserves two IP addresses per host specifically for host-vtep functionality, reducing available capacity for virtual machines. Operators cannot apply standard EC2 scaling models without triggering address exhaustion errors. Https://docs. Aws. Amazon. According to Html, EC2 security group rules fail to enforce traffic policies on EVS elastic network interfaces attached to VLAN subnets. Network access control lists (ACLs) become the mandatory enforcement point for all north-south traffic flows. This architectural shift forces a departure from instance-level firewalls toward subnet-wide policy application. Relying on stateless ACLs introduces operational friction when managing return-path traffic for complex applications. Traditional VPC routing relies on distributed security groups, whereas the Route Server model centralizes path selection but decentralizes security enforcement.
Ignoring these subnet limits causes cascading BGP withdrawal storms during host additions. Lost overlay connectivity windows represent a measurable cost. CIDR blocks require explicit planning for the two-address overhead per node. Failure to account for this reduces proven cluster density notably.
Inside the EVS Two-Layer Networking Model and BGP Mechanics
Defining the EVS Two-Layer Model: as reported by Underlay VPC and NSX Overlay
Key Components Deep Dive, ESXi hosts launch as Amazon Elastic Compute Cloud bare-metal instances like i4i. Metal inside customer VPC subnets. This physical foundation constitutes the underlay layer where IP connectivity relies entirely on standard AWS route tables rather than virtual switches. The architecture mandates that every host connects via elastic network interfaces to enable management traffic without hypervisor abstraction. Operators must recognize that security groups do not enforce policies on these specific interfaces, requiring strict network access control lists for perimeter defense.
Overlay networking operates independently above this physical substrate using logical segments set by software policy. Per Key Components Deep Dive, three clustered NSX Manager appliances run within the initial cluster to orchestrate these overlay transport zones centrally. These managers configure T0 Gateways and T1 Gateways to abstract workload placement from underlying physical topology constraints. The separation allows portable networking policies but introduces a coordination dependency between cloud infrastructure teams and virtualization administrators.
| Layer Component | Function | Management Domain |
|---|---|---|
| Underlay VPC | Physical IP transport | AWS Console |
| NSX Overlay | Logical segmentation | vCenter / NSX Manager |
| Route Server | BGP Peer termination | AWS Network Manager |
The critical tension lies in operational ownership; cloud teams control the underlay CIDR blocks while server teams manage overlay prefixes.
Based on Considerations, Amazon EVS requires two VPC Route Server endpoints within the same Availability Zone to establish resilient north-south traffic flows. The mechanism pairs each of the two NSX Edge T0 nodes with a distinct endpoint, enforcing an Active/Standby topology where only the primary edge processes external routing updates. This design eliminates single points of failure in the underlay while maintaining a strict one-to-one mapping between virtual edges and physical route server instances. However, the architecture supports only default BGP keepalive timers because Multi-hop Bidirectional Forwarding Detection (BFD) remains unsupported, leaving failure detection dependent on standard hold-down intervals. Operators must accept slower convergence times during link flaps compared to on-premises deployments utilizing micro-second BFD polling.
| Feature | Active Node | Standby Node |
|---|---|---|
| BGP Session | Established | Idle |
| Route Advertisement | Full Prefixes | None |
| Traffic Flow | All North-South | Zero |
Network teams must monitor the active T0 gateway closely, as it represents a singular throughput bottleneck for the entire cluster. Undersizing this component risks saturating the instance type before compute resources reach capacity limits.
according to Mitigating MTU Mismatches and T0 Throughput Bottlenecks in EVS
Considerations, current generation EC2 instances support jumbo frames up to 9001 bytes while Transit Gateway caps at 8500 bytes. This discrepancy creates a silent failure mode where NSX transport packets exceeding the lower threshold trigger fragmentation or drops across the underlay boundary. Operators must manually align MTU settings on logical switches to the lowest common denominator rather than assuming end-to-end jumbo frame compatibility. The cost is reduced payload efficiency for workloads traversing hybrid boundaries compared to pure on-premises environments.
InterLIR analysis indicates that undersized T0 gateways frequently saturate during bulk data transfers, creating a hard ceiling on north-south throughput. The active NSX Edge node processes all external traffic in this Active/Standby model, making vertical scaling of the edge instance more impactful than horizontal expansion. Monitoring data path metrics becomes mandatory because standard CPU alerts often lag behind actual packet processing bottlenecks. A qualitative shift occurs where network teams prioritize gateway sizing over host count during initial capacity planning phases.
| Component | Max Frame Size | Risk Factor |
|---|---|---|
| EC2 Instances | 9001 bytes | Low within VPC |
| Transit Gateway | 8500 bytes | High for hybrid flows |
| Direct Connect | 8500 bytes | High for on-prem sync |
The architectural tension lies between maximizing local vSAN performance with large frames and maintaining smooth connectivity with AWS managed services. Ignoring this constraint results in intermittent connectivity loss that standard troubleshooting tools fail to attribute to size mismatches.
Implementing Network Prerequisites and CIDR Planning Strategies
Mandatory Non-Overlapping CIDR Blocks for EVS Infrastructure

Strict segregation of address space stops silent traffic blackholing caused by BGP prefix advertisement conflicts between the underlay VPC and NSX overlay domains. Operators allocate distinct ranges that avoid intersection with on-premises networks connected via AWS Direct Connect. Any overlap forces a complete redeployment because route summarization cannot resolve conflicting prefixes within the same routing table. 1. Assign a minimum /24 CIDR specifically for underlay infrastructure components.
- Isolate management, vSAN, and NSX interfaces into separate, non-contiguous subnets.
- Verify that overlay segments use hierarchical allocation to avoid intersecting primary VPC addresses.
- Exclude all customer workloads, including bastion hosts, from Amazon EVS-managed subnets.
Ignoring these boundaries results in total loss of north-south connectivity for affected virtual machines. Network teams treat CIDR planning as a hard gate rather than a flexible design parameter. The cost is rigid upfront planning, yet this constraint prevents downstream routing chaos.
Allocating Minimum /24 Subnets for VCF Underlay Components
A minimum /24 CIDR per VPC supports management, vSAN, NSX interfaces, and appliances. Amazon EVS consumes two addresses per host for host-vtep functionality, capping a single subnet at 112 hosts. Placing customer workloads like bastion hosts in these ranges triggers address exhaustion that halts cluster scaling. Silent traffic blackholing occurs during BGP prefix advertisement if overlap exists with on-premises networks.
- Assign dedicated /24 blocks to each VCF component type to prevent broadcast domain contention.
- Exclude all non-VCF resources from these subnets to maintain certification compliance.
- Validate DNS resolution end-to-end before launching the automation framework.
| Component Type | Allocation Purpose | Constraint |
|---|---|---|
| Management | SDDC Manager, vCenter | Dedicated /24 only |
| vSAN | Storage traffic | Non-overlapping |
| NSX Interfaces | T0/T1 Gateways | No security groups |
| Host-VMotion | Live migration | Max 112 hosts |
InterLIR review indicates that sharing subnets introduces unpredictable latency spikes during vSAN resync operations due to competing I/O flows. Reduced VPC address space efficiency is the drawback, yet this sacrifice prevents catastrophic deployment failures where route tables cannot distinguish overlay from underlay traffic. Proper sizing ensures NSX Edge nodes maintain stable eBGP sessions without packet loss from saturated interface queues.
Preventing Blast Radius Expansion with Dedicated EVS VPCs
Shared services within the EVS VPC create uncontained failure domains that complicate auditing. The mechanism isolates VCF components by restricting the underlay to management, vSAN, and NSX interfaces only. Misconfigurations in these areas are a leading cause of deployment failures. Placing non-EVS workloads adjacently increases the risk of overlapping CIDR ranges causing silent traffic blackholing. Operators validate network prerequisites before launch to prevent BGP prefix conflicts between overlay and underlay layers.
- Deploy Amazon EVS resources in a VPC dedicated exclusively to infrastructure components.
- Connect adjacent workload VPCs via Transit Gateway to enforce clear blast-radius boundaries.
- Utilize the AWS spreadsheet tool to verify CIDR separation prior to provisioning.
Clearer cost allocation and simplified compliance auditing emerge for regulated environments. Limiting the scope of route table modifications reduces the surface area for human error during scaling events. This architectural choice trades minor inter-VPC latency for significant operational safety.
Optimizing Security Policies and Resolving Common Deployment Failures
Enforcing L2-L7 Policies with NSX Distributed Firewall and NACLs
VDefend DFW enforces L2–L7 rules at the hypervisor kernel while NACLs govern underlay traffic. This architecture segregates east-west micro-segmentation from north-south perimeter control because EC2 security groups do not apply to Amazon EVS VLAN interfaces. Operators must deploy NACLs for protocols like BGP and DNS since standard security group enforcement is absent on these specific ENIs. Stateful inspection logic differs between the overlay and underlay domains, introducing operational friction. InterLIR examination indicates that relying on vDefend for all filtering creates a blind spot for underlay management traffic unless explicitly mirrored in AWS controls. Network engineers configure parallel rule sets to maintain consistent posture across both layers.
Routing DNS queries through dedicated resolver endpoints prevents issues arising from assumptions that default VPC resolution functions identically for overlay VMs. Misalignment causes silent packet loss during initial cluster formation when BGP sessions fail to establish. The cost is immediate connectivity failure.
Deploying Static Routes for Transit Gateway and Cloud WAN Connectivity
Transit Gateway fails to auto-import NSX overlay prefixes, requiring manual static route entries for each CIDR range. Operators explicitly map every NSX overlay segment to the Amazon EVS VPC attachment within the Transit Gateway route table. This manual step bypasses the dynamic learning capability of the VPC Route Server, which propagates routes only within the local VPC context. Hybrid traffic destined for virtual machines hits a black hole at the gateway edge without these static pointers. Maintaining route currency imposes an operational burden as new workload domains expand the overlay address space. InterLIR evaluation indicates that failure to update these tables immediately after provisioning results in total connectivity loss for on-premises users attempting to reach cloud resources.
Direct Connect termination points further constrain design choices for hybrid architects. Direct Connect transit VIFs must anchor on Transit Gateway rather than the Amazon EVS underlay VPC. This architectural mandate prevents direct BGP peering between on-premises routers and NSX Edge nodes over private virtual interfaces. All external traffic routes through the AWS global network backbone before entering the VMware environment.
Stricter change management processes become necessary where network updates require coordinated updates across multiple AWS services. InterLIR recommends validating DNS attributes remain enabled throughout this process to prevent resolution failures during cutover events. Delays in synchronization break cross-premise communication paths instantly.
Avoiding Blind Spots in Overlay Traffic Monitoring and Cost Leaks
VPC Flow Logs capture only underlay traffic on Amazon EVS ENIs, leaving NSX overlay exchanges invisible to standard AWS telemetry. Reliance on Broadcom VMware tools becomes mandatory because the hypervisor encapsulates east-west traffic before it reaches the physical interface where AWS monitoring attaches. This architectural separation creates a total blind spot for operators expecting native visibility into logical segment flows. Visibility gaps are not the only concern; notes that traffic mirroring sessions incur hourly charges per ENI even if the source instance is stopped or terminated. Billing behavior persists until explicit session deletion, creating potential cost leaks for forgotten diagnostic configurations. InterLIR assessment indicates that DNS resolution failures often stem from this same monitoring disconnect, where operators cannot trace overlay-specific query drops without NSX Manager introspection.
- Deploy vRealize Network Insight or equivalent VCF NSX tools to visualize overlay topology and flow data.
- Audit active traffic mirroring sessions weekly to terminate unused configurations and halt unnecessary billing accrual.
- Correlate NACL logs with vDefend DFW events to reconstruct full-path transaction histories across both network layers.
- Review billing dashboards monthly to identify lingering charges from dormant diagnostic sessions.
Blind adherence to underlay metrics guarantees incomplete incident response data during production outages. Full visibility requires integrating toolsets from both the cloud provider and the virtualization layer.
About
Nikita Sinitsyn Customer Service Specialist at InterLIR brings eight years of telecommunications expertise to the complexities of cloud networking. His daily work managing RIPE database operations and ensuring clean IP reputation directly correlates with the critical network architecture required for Amazon Elastic VMware Service (Amazon EVS). As organizations lift and shift VMware workloads to AWS, maintaining reliable connectivity and secure IP resources becomes paramount. Sinitsyn's experience in resolving technical support challenges within the IPv4 marketplace provides a unique perspective on the infrastructure demands of running VMware Cloud Foundation inside an Amazon VPC. At InterLIR, a Berlin-based leader in transparent IPv4 resource redistribution, the team understands that modernizing networks requires not just compute power, but reliable, conflict-free addressing. This article bridges Sinitsyn's practical knowledge of IP availability with the strategic implementation of Amazon EVS, offering readers actionable insights on building resilient hybrid environments without refactoring applications.
Conclusion
The initial allure of managed VMware on AWS dissolves when operational complexity scales beyond pilot phases, specifically where overlay blindness creates untraceable latency and unchecked billing leaks. As enterprises pivot toward sovereign cloud architectures and AI-driven hyperscaling by 2027, the friction between native AWS telemetry and virtualized network layers will become a critical bottleneck. Relying solely on underlay metrics guarantees fragmented incident response, leaving teams unable to diagnose east-west traffic failures that occur entirely within the hypervisor. This architectural disconnect demands a strategic shift: organizations must treat hybrid visibility as a non-negotiable prerequisite before expanding workloads.
Deploy a unified observability stack that ingests NSX-specific flow data alongside VPC logs within the next quarter to prevent diagnostic blackouts during peak migration windows. Do not attempt to scale production traffic until you can correlate logical segment flows with physical interface metrics in real-time. Start this week by auditing all active traffic mirroring sessions across your accounts; immediately terminate any diagnostic configurations attached to stopped instances to halt unnecessary hourly accrual. This single action stops the bleeding of wasted budget while highlighting the visibility gaps that will otherwise cripple your multi-cloud durability. True control requires seeing the traffic that standard cloud tools simply cannot detect.