Cloud WAN fixes hybrid routing across regions

Blog 16 min read

Betsson migrated from complex mesh topologies to eliminate rising costs driven by excessive Direct Connect links. This case study argues that segment-based routing offers a superior, cost-effective alternative to traditional Transit Gateway deployments for global operators.

Facing overhead across three continents, Betsson replaced their initial architecture to resolve specific bottlenecks: the maintenance nightmare of a full mesh topology, routing conflicts in Active/Active Hub & Spoke setups, and escalating expenses from redundant gateways. As Amazon Web Services details, the gaming giant leveraged automatic full mesh peering and dynamic routing within a single core network to simplify operations without inducing downtime. The transition specifically targeted the complexity plaguing their multi-region workload distribution.

Readers will learn how Cloud WAN streamlines global hybrid connectivity by extending segments across multiple regions while integrating natively with Network Firewall for east-west inspection. The discussion covers the critical planning phase, including BGP ASN documentation and IP address allocation strategies required for a smooth migration. Finally, the analysis reveals why abandoning rigid hub-and-spoke models provides the scalability necessary for high-volume gaming environments.

The Role of AWS Cloud WAN in Modernizing Global Hybrid Connectivity

AWS Cloud WAN Core Network and Core Network Edge Definitions

The AWS Cloud WAN architecture centers on a "core network" acting as a global construct for attaching VPCs and Direct Connect Gateways per AWS documentation data. This logical boundary replaces manual peering meshes with centralized policy enforcement across regions. Operators define segmentation rules once, and the system propagates routing logic automatically. Maintaining disjointed Transit Gateway instances increases operational overhead notably.

A Core Network Edge functions as the local connection point where attachments comply with set policies according to AWS whitepaper specifications. Traffic enters and exits the global fabric strictly through these validated nodes. Betsson Group utilized this model to connect workloads across Europe and South America without rebuilding individual tunnels. The design ensures that any communication between segments passes through a firewall for inspection. Regional availability gaps create deployment friction during initial migration phases. InterLIR notes that unsupported regions require interim Transit Gateway bridges to maintain connectivity standards. This hybrid approach adds temporary complexity until full global coverage arrives.

FeatureCore NetworkCore Network Edge
ScopeGlobalRegional/Local
FunctionPolicy definitionTraffic entry/exit
DependencyNoneRequires Core Network

Network engineers shift from device-level configuration to intent-based policy management. Routing tables update dynamically as new attachments join the specified segments. Human error risks drop in large-scale environments where 75% of enterprise data will soon process at the edge. Centralized control simplifies compliance auditing for gaming operators handling sensitive user transactions globally.

Betsson's Migration from Transit Gateway to Automatic Full Mesh Peering

data shows Betsson operates across three continents where manual Transit Gateway meshes caused scaling failures. The initial architecture struggled with routing complexity for Active/Active Hub & Spoke topologies as the network expanded rapidly. Maintaining a full mesh topology increased costs for extra Direct Connect links and gateway instances notably. These constraints forced a reevaluation of how hybrid connectivity in AWS handles global traffic flows.

Betsson integrated AWS Cloud WAN utilizing automatic full mesh peering with dynamic routing out of the box. This approach leverages BGP to exchange routes dynamically between Cloud WAN and existing gateways without manual static entries. A single core network using multiple segments extended coverage to various regions while preserving isolation boundaries. Operators asking should I use Cloud WAN for global network must weigh this automation against region availability gaps. Zero downtime planning was required due to 24×7 operations across different time zones. InterLIR notes that segment-based policies allow sensitive workloads to pass through firewalls while general traffic bypasses inspection nodes. Some regions like sa-east-1 lacked native support during deployment, requiring interim Transit Gateway bridges. This hybrid state introduces temporary complexity until full regional coverage arrives.

FeatureLegacy MeshCloud WAN
PeeringManualAutomatic
RoutingStatic/ComplexDynamic BGP
PolicyPer-GatewayCentralized

Reduced operational hours spent updating route tables manually provides a measurable cost benefit. Global gaming operators face unique pressure to maintain low latency while enforcing strict security perimeters. Automating the underlying fabric allows engineering teams to focus on application logic rather than connectivity plumbing. Failure to automate path selection risks outages during rapid expansion phases common in digital entertainment sectors.

Hub-and-Complexity Versus Dynamic Policy Segments, Active/Active Hub & Spoke topologies incur high costs via extra Direct Connect links and gateway instances. Legacy architectures force operators to manage complex peering meshes that degrade as regional scale increases. Manual overhead creates a fragile state where configuration drift often leads to routing loops or blackholes. The shift to dynamic policy segments removes the need for point-to-point adjacency management entirely.

FeatureHub-and-Spoke TopologyDynamic Policy Segments
Routing LogicStatic BGP imports per peerDeclarative core network policy
Scaling ModelLinear cost increase per nodeAutomated full mesh peering
Failure DomainSingle link impacts specific spokeSegment-wide isolation enabled
ManagementPer-gateway CLI changesCentralized tag-based mapping

Betsson achieved a 10-15 percent cost reduction after migrating to the new architecture. Latency for global traffic improved by 100 ms according to . These metrics validate the operational efficiency of centralized control planes over distributed static configs. Strict adherence to tagging schemas is demanded before deployment begins. A missing tag prevents attachment approval, causing immediate connectivity loss for unclassified resources. Operators must define attachment policies precisely to avoid blocking legitimate traffic flows during the initial cutover window.

Defining Segment Isolation and Mandatory Firewall Traversal Rules

Routing, Sharing, as reported by and Security Policy, sensitive VPCs within a segment must use attachment isolation to force firewall traversal. This mechanism mandates that attachment isolation prevents direct lateral movement, forcing all east-west traffic through an inspection node before reaching its destination. Operators configure this by tagging specific VPCs, ensuring the core network automatically routes flows through the assigned security stack without manual route table edits. Every packet incurs additional latency due to the mandatory hop, which can impact real-time applications if the firewall cluster lacks sufficient throughput capacity. Network teams must size inspection endpoints to handle aggregate segment traffic rather than just north-south bursts.

Inter-segment communication follows a stricter rule where any traffic crossing segment boundaries must pass through a firewall Sharing, and Security Policy guidelines. This design enforces a zero-trust boundary between development, production, and hybrid environments by default. Traffic moving from a developer segment to a database segment encounters the same deep packet inspection as external internet traffic. Uniformity increases the blast radius risk if the central firewall policy contains an overly permissive rule affecting multiple segments. A single misconfiguration could inadvertently allow cross-segment leakage that isolation was meant to prevent.

Traffic FlowDefault ActionInspection Point
Intra-segment (Isolated)Denied DirectlyNetwork Firewall
Inter-segmentDenied DirectlyNetwork Firewall
Hybrid (On-prem)Denied DirectlyNetwork Firewall

External connectivity integrating with the global fabric maps to the Hybrid segment automatically based on attachment type definitions. This classification ensures that AWS Site-to-Site VPN links and Direct Connect Gateways never bypass the security perimeter. Regional outages in the inspection zone can sever connectivity for all dependent segments simultaneously.

and Classification, external connectivity like Site-to-Site VPN and Direct Connect Gateway maps to the Hybrid segment via tag values. Operators define rules where specific tag keys trigger automatic assignment, eliminating manual route table edits for every new circuit. This mechanism relies on the core network policy document to inspect attachment metadata upon creation. Misaligned tags result in immediate traffic blackholing since unclassified attachments receive no default route propagation. Network engineers must validate tagging schemas before deployment to prevent connectivity loss for critical on-premises links.

Configuring BGP within this model requires unique ASN ranges to avoid peering conflicts across the global fabric. InterLIR documentation indicates that distinct ASN allocation prevents routing loops when integrating multiple AWS Transit Gateway instances.

  1. Documenting existing on-premises ASN allocations.
  2. Defining a sequential BGP ASN range for the new cloud environment.
  3. Applying these ASNs to attachment types to ensure unique identification.
Configuration ElementLegacy ApproachAutomated Policy
Route PropagationManual peer importsDynamic full mesh
Segment MappingStatic route tablesTag-based classification
Firewall InsertionPer-VPC agent configCentral service insertion

Firewall integration mandates that all Hybrid segment traffic traverses a central inspection node before reaching internal resources. This design enforces a hard boundary where service insertion policies override standard routing decisions. A tension exists between strict isolation and latency; forcing all hybrid flows through a single region's firewall cluster increases round-trip time for distant spokes. Operators in regions like sa-east-1 historically used AWS Transit Gateway peering as a workaround, but modern tag-based classification removes this dependency where supported. The operational consequence is a reduction in configuration drift, as policy changes apply globally rather than per device.

Based on AWS Cloud WAN BGP Planning, having a unique BGP ASN number helped Betsson avoid connectivity issues during migration. Operators must execute a full inventory of all BGP ASN used across on-premises setups and the entire AWS Network environment before deploying segment-based routing. This mechanism prevents AS_PATH loops by ensuring every attachment presents a distinct identifier to the global core network.

Allocation MethodRisk ProfileOperational Outcome
Random ReuseHigh collision probabilityRoute rejection or blackholing
Sequential RangePredictable auditingClean path validation

The process requires defining a new unique sequential BGP ASN range specifically for the cloud fabric.

  1. Document existing allocations per region and environment.
  2. Identify overlaps between legacy gateways and planned attachments.
  3. Assign non-conflicting values from the reserved sequential block.

Strict sequential allocation creates dependency on centralized number management, slowing down decentralized team provisioning if not automated via tag values. InterLIR guidance suggests maintaining a live ledger of assigned ranges to prevent accidental duplication in hybrid topologies. Skipping this inventory phase invites immediate routing instability that dynamic policies cannot self-heal. Isolating production ASN spaces from test environments prevents ambiguous path selection. A structured approach ensures the core network accepts all advertisements without manual filter tuning. This preparation step directly supports the zero-downtime requirement for global gaming platforms.

Strategic Advantages of Cloud WAN Over Traditional Transit Gateway Deployments

Operational Overhead in Full Mesh Transit Gateway Topologies

Manual peering management scales linearly, creating fragility as node counts rise. Traditional AWS Transit Gateway deployments require explicit definition for every adjacency, forcing engineers to configure point-to-point BGP sessions individually. This structural approach demands that each new region adds exponential complexity to the routing policy. A single configuration error in these static meshes frequently causes route leaks or blackholes across the entire fabric. Operators report that maintaining consistent security postures becomes impossible when manual edits outnumber automated checks. The cost of this overhead is measurable; interconnect charges grow disproportionately as the mesh expands without intelligent aggregation.

DimensionFull Mesh PeeringCentralized Policy Model
Scaling LogicExponential configuration growthLinear attachment addition
Error SurfaceHigh (Low (declarative policy)
Routing UpdatesStatic imports requiredDynamic exchange via BGP

The critical limitation of the legacy model is its inability to absorb rapid scaling without proportional staff increases. Betsson identified this rigidity as a primary blocker before migrating to AWS Cloud WAN. The consequence is a network architecture that resists change rather than enabling it. Static topologies force operators to choose between stability and agility, a trade-off modern cloud-native designs eliminate.

According to Handling Regions Without Cloud WAN Support, Betsson deployed a single AWS Transit Gateway in sa-east-1 to bridge unsupported infrastructure. This hybrid pattern maps multiple routing domains from the local gateway directly to specific Cloud WAN segments using route table attachments. The mechanism effectively extends the global core network policy into regions lacking native service presence by peering the regional gateway with a supported neighbor.

Deployment ModeCost StructureRouting Complexity
Native Cloud WANFixed segment pricingLow (Automated)
Hybrid TGW PeeringPer-attachment feesMedium (Manual mapping)
Full Mesh TGWExponential interconnectHigh (Static config)

Operators must configure distinct route table attachments for each environment to maintain strict segmentation boundaries across the peer link. A critical tension exists here; while this approach minimizes capital expenditure by avoiding duplicate firewalls in unsupported zones, it introduces a hard dependency on the stability of the inter-region peering session. If the peer link fails, the isolated region loses all segmented routing intelligence until convergence occurs. InterLIR notes that such architectural compromises often become permanent technical debt if the region never gains native support. Network teams should treat this as a temporary fallback rather than a standard design pattern for greenfield deployments.

Granular Control Versus All-or-Nothing Azure Virtual WAN Approaches

Azure Virtual WAN enforces an "all-or-nothing" global scope that prevents partial network connections. This structural constraint forces enterprises to onboard entire infrastructures simultaneously, eliminating the option for phased regional rollouts. In contrast, AWS Cloud WAN enables operators to attach specific resources like VPCs or VPNs to a core network incrementally. AWS solutions provide superior granular routing control comparable to VRF-lite equivalents. The trade-off is complexity; setting up these precise policies requires deeper engineering expertise than the fully managed Azure alternative. Operators gain the ability to isolate workloads by segment without disrupting global connectivity.

FeatureAWS Cloud WANAzure Virtual WAN
Deployment ScopePartial/GranularGlobal Mandatory
Routing ControlHigh (VRF-like)Managed/Limited
Phased MigrationSupportedNot Supported

The ability to connect only part of the network allows for strict security boundaries during transitional states. A rigid global mandate increases the blast radius of configuration errors during initial deployment phases.

Executing Zero-Downtime Migration from Transit Gateway to Cloud WAN

Zero-Downtime Migration Strategy Using Terraform and Dual Attachments

Dashboard showing zero-downtime migration metrics including 24x7 continuity, doubled topology complexity, global data growth from 2.7 to 7.6 billion, and key statistical drivers for migration success.
Dashboard showing zero-downtime migration metrics including 24x7 continuity, doubled topology complexity, global data growth from 2.7 to 7.6 billion, and key statistical drivers for migration success.

Infrastructure as Code via Terraform enabled Betsson Services Limited to maintain 24×7 operational continuity throughout the migration. The strategy relied on dual attachments where new Cloud WAN VPNs ran parallel to legacy Transit Gateway links before any traffic shift occurred. BGP local preference attributes steered flow through the new path while the old circuit remained a hot-standby. This approach eliminates hard cutovers that typically trigger outages during global updates. Maintaining two active topologies doubles the transient state complexity for routing tables. Operators must validate that AS-Prepends do not inadvertently create suboptimal return paths during the transition window. Automated validation pipelines become a strict requirement rather than manual CLI checks. Skipping the parallel run phase increases rollback time notably when errors occur according to InterLIR analysis. The cost of this redundancy is measurable in temporary resource duplication, yet it prevents revenue loss from unplanned downtime. Most operators fail to account for the latency variance introduced by asymmetric routing during the switch. Precision in BGP attribute tuning determines whether the migration appears smooth or causes packet reordering.

  1. Deploy new Cloud WAN attachments alongside existing Transit Gateway connections.
  2. Advertise identical prefixes from both gateways with higher local preference on the new link.
  3. Verify inbound flow symmetry using telemetry before decommissioning legacy peering.
  4. Remove obsolete Transit Gateway route table associations only after full traffic stabilization.

Executing Route Table Swaps and IPSEC VPN Cutover Procedures

VPC route tables for private subnets required modification from transit gateway to Cloud WAN to finalize the migration per Betsson Services Limited data. Operators must first attach all target VPCs to appropriate segments within the new core network using Infrastructure as Code. The swap itself involves updating the default route in private subnet tables to point toward the Cloud WAN attachment instead of the legacy Transit Gateway ID. AWS NAT Gateway entries in private subnets were retained for existing VPCs due to IP allowlisting issues according to Betsson Services Limited data. This constraint forces a hybrid routing state where internet-bound traffic bypasses the global mesh while east-west flows apply it. Retaining legacy egress points creates an asymmetric path risk if firewall policies are not mirrored exactly on the inspection segment. A new set of VPNs were deployed from on-premises equipment to AWS Cloud WAN to achieve zero downtime migration for site-to-site IPSEC VPNs according to Betsson Provisions Limited data. Traffic shifts by manipulating BGP attributes rather than flipping switches physically. Engineers apply AS-Prepends to the legacy path and increase local preference on the new Cloud WAN tunnels. Transient routing table bloat occurs during the cutover window as a direct result of this dual-path strategy. Control-plane instability arises exactly when stability is most critical if memory pressure goes unaccounted.

  1. Deploy parallel Site-to-Site VPN attachments to the Hybrid segment.
  2. Advertise identical prefixes from both Transit Gateway and Cloud WAN edges.
  3. Apply BGP local preference overrides to favor the new tunnel inbound.
  4. Validate flow logs before withdrawing old routes from the on-premises router.

Validation Checklist for Connectivity, Observability, and Failover Testing

Connectivity testing included tests within each segment across regions using test VPCs and from on-premises using test VPNs per Betsson Capabilities Limited data. Operators must verify path availability before shifting production traffic to the core network. This step confirms that attachment policies correctly map resources to their assigned security zones. Relying solely on synthetic checks may miss asymmetric routing issues introduced by complex BGP attribute manipulation. Network teams should validate return paths explicitly to prevent blackholing during the cutover window. Throughput graphs with alarm thresholds and AWS Network Manager events for blackhole routes covered observability monitoring for all attachments according to Betsson Offerings Limited data.

About

Nikita Sinitsyn Customer Service Specialist at InterLIR brings eight years of telecommunications expertise to the complex discussion of AWS Cloud WAN and hybrid connectivity. His daily work managing RIPE and ARIN database operations, BGP routing, and IP reputation directly correlates with the fundamental network infrastructure required for reliable cloud architectures. At InterLIR, a Berlin-based marketplace specializing in IPv4 resources, Nikita ensures clients secure clean, reliable IP addresses essential for expanding global networks. This hands-on experience with internet registry protocols and spam control provides him unique insight into the challenges Betsson Services Limited faced while scaling their gaming platforms across multiple continents. By understanding the critical nature of network availability and address management, Nikita effectively bridges the gap between raw IP resource allocation and advanced cloud networking solutions like AWS Cloud WAN, offering a practical perspective on achieving smooth global connectivity.

Conclusion

Scaling hybrid connectivity reveals that static routing policies collapse under the weight of dynamic global traffic, creating silent blackholes that synthetic checks miss. While latency improvements are measurable, the true operational tax emerges from unmanaged BGP oscillation during partial link degradation, not total failure. Enterprises ignoring hysteresis tuning will face recurring outages as recovery paths compete with degraded primaries, turning a durability feature into a stability liability. The window for passive observation has closed; architects must treat route propagation states as critical application dependencies rather than background noise.

Adopt a strict "observe-before-withdraw" mandate for all path modifications by Q3, requiring explicit return-path validation before shifting production workloads. Do not rely on standard CloudWatch metrics alone, as their inherent lag masks real-time forwarding plane fractures. Instead, correlate control-plane events directly with data-plane loss statistics to catch asymmetric routing before users report latency spikes. This shift demands moving beyond basic availability checks to behavioral verification of complex attribute manipulation.

Start this week by auditing your current BGP local preference settings against simulated recovery scenarios where the primary link remains partially impaired. Verify that your network does not oscillate when a recovered path offers worse performance than a secondary alternative. This single test prevents the most common cause of post-migration instability in large-scale deployments.

Frequently Asked Questions

What specific pricing factors determine AWS Cloud WAN costs for global deployments?
Pricing depends on four specific factors like core network edges and attachments. Data processing charges apply to gigabytes sent from VPCs or VPNs to the edge, integrating third-party SD-WAN costs directly into the billing model for operators.
How does segment-based routing reduce human error in large-scale enterprise environments?
Centralized control simplifies compliance auditing significantly for global gaming operators handling transactions. This approach reduces human error risks in large-scale environments where 75% of enterprise data will soon process at the edge according to current architectural shifts.
What bottlenecks forced Betsson to replace their full mesh Transit Gateway topology?
Betsson replaced complex meshes to fix routing conflicts and excessive Direct Connect link costs. Their initial architecture struggled with scaling failures across three continents, prompting a switch to automatic full mesh peering with dynamic BGP routing capabilities.
How do Core Network Edges function differently than the global Core Network?
Core Network Edges act as local connection points where traffic enters and exits the fabric. While the Core Network defines global policies, these regional edges ensure attachments comply with set rules before data moves between segments globally.
What planning steps are critical for achieving zero-downtime migration to Cloud WAN?
Operators must document existing BGP ASN ranges and IP address allocations first. This preparation ensures unique sequential numbering prevents connectivity issues while allowing sensitive workloads to pass through firewalls during the transition period without service interruption.
Nikita Sinitsyn
Nikita Sinitsyn
Customer Service Specialist