Airgapped SageMaker: Zero Public Internet Setup

Blog 13 min read

Deploying Amazon SageMaker Unified Studio across three Availability Zones creates the mandatory foundation for a compliant, air-gapped architecture.

The industry's 2026 pivot toward serverless options and native AI integration demands that database services connect directly to tools like Bedrock within strictly isolated networks, rendering public internet exposure unacceptable for regulated workloads. As noted in recent AWS documentation, this shift ensures sensitive data never traverses the public internet while maintaining full functionality for data cataloging and machine learning workflows.

This guide dissects the mechanics of constructing a custom airgapped VPC that distributes private subnets to guarantee high-availability without public access. We move beyond theory to execute a compliant deployment, enforcing granular network control while integrating existing private data sources through these secured pathways.

The Role of Air-Gapped VPCs in Secure AI Infrastructure

Air-Gapped VPC Architecture for SageMaker Unified Studio

An air-gapped VPC eliminates public internet exposure by routing all SageMaker Unified Studio traffic exclusively over the AWS backbone via AWS PrivateLink. Officially announced in January 2026, this architecture replaces standard HTTPS transfers with fully private network pathways to satisfy HIPAA and FedRAMP mandates. It delivers complete technical know-how for a fully private network using Amazon Virtual Private Cloud with no public internet exposure.

Traffic routing uses AWS PrivateLink to ensure customer data never touches the public internet infrastructure. Organizations implementing this airgapped architecture gain simplified compliance auditing capabilities while integrating with strictly controlled private data sources.

Operational costs include hourly charges for VPC Interface Endpoints plus data processing fees that vary by region. Strict isolation competes with service breadth; adding non-mandatory endpoints for advanced compute features expands the attack surface slightly while enabling broader functionality. Operators must balance strict FedRAMP boundaries against the need for diverse tooling within the airgapped environment. Bring-your-own VPC deployment reuses existing infrastructure to enforce company-specific compliance requirements. Most customers select the bring-your-own (BYO) VPC. This approach enables organizations to integrate SageMaker Unified Studio with strictly controlled private data sources without internet exposure.

Automated Quick Set Up templates generate a CloudFormation stack that creates VPCs and subnets instantly. The quick create stack using template URL is not intended for production use. Rapid prototyping benefits from this speed, yet enterprise compliance mandates manual subnet distribution across three Availability Zones.

Deployment ModelPrimary Use CaseCompliance Status
BYO VPCProduction workloadsFull audit control
Quick Set UpRapid prototypingNon-production only

Generative AI protection requires defense-in-depth architectures that default templates cannot guarantee. Enterprises deploying sensitive model weights must avoid shared tenancy risks inherent in automated wizards. The limitation of Quick Set Up is the inability to customize security groups before domain creation. Production environments demand pre-configured network firewalls to maintain reliability during integration. Operators choosing automation sacrifice the granular network controls exceeding default configurations of alternative platforms. The default CloudFormation template creates an Internet Gateway, violating strict isolation mandates for regulated workloads.

Interface endpoints like com. Amazonaws. ${region}. Glue handle metadata while com. Amazonaws. ${region}. S3 gateway endpoints manage dataset storage. Architecture Components Interface types rely on AWS PrivateLink to route STS role assumptions through elastic network interfaces rather than route tables. Gateway types inject specific prefixes directly into the VPC routing table, bypassing the NAT requirement entirely for object storage access.

FeatureInterface EndpointGateway Endpoint
Target ServicesGlue, STS, DataZoneS3, DynamoDB
Routing MethodElastic Network InterfaceRoute Table Entry
Cost ModelHourly + Data ProcessingFree (
AvailabilitySingle AZ per ENIRegional Redundancy

Operators must provision three Availability Zones to prevent single-point failures during Glue catalog updates. The cost implication involves paying hourly rates for every interface endpoint plus data processing fees, unlike the free gateway model. A frequent deployment error involves attaching interface endpoints to public subnets, which exposes private API calls to unnecessary security group complexity. Traffic for com. Amazonaws. ${region}. Sts fails silently if the associated security group blocks egress to the specific PrivateLink service IP range. This architectural distinction dictates that SageMaker projects cannot assume IAM roles without explicit interface configuration, regardless of S3 gateway presence.

Subnet Distribution Across Availability Zones for Redshift Serverless

Redshift Serverless integration mandates at least two private subnets spanning distinct Availability Zones to satisfy high-availability constraints. Operators selecting bring-your-own (BYO) VPC configurations must distribute these resources across three Availability Zones. Each subnet requires a minimum reserve of three free IPs to avoid deployment failures when Enhanced VPC Routing remains disabled. This distribution strategy directly counters the risk of single-zone congestion that frequently halts compute provisioning in constrained regions.

Minimal footprint clashes with operational durability. While two subnets meet the absolute baseline, relying on the minimum introduces fragility during zone-level capacity shortages. AWS guidance explicitly warns against using constrained Availability Zones because insufficient IP space triggers extended application creation times or total failure. Production environments often sacrifice IP efficiency to guarantee that Redshift Serverless workloads survive zonal outages without manual intervention.

Deployment ModeMinimum SubnetsAZ DistributionRisk Profile
Baseline Compliance22 Distinct ZonesHigh (Capacity Starvation)
Production Resilient3+3 Distinct ZonesLow (Zonal Fault Tolerance)

The cost of re-architecting a live SageMaker Unified Studio domain exceeds the initial overhead of allocating surplus address space. Operators ignoring this margin face measurable downtime when scaling compute fleets across multiple projects simultaneously.

Mandatory Endpoint Checklist for Air-Gapped SageMaker Domains

Three specific interface endpoints-DataZone, STS, and SageMaker-must exist before domain launch to prevent immediate control-plane failure. Operators often overlook that com. Amazonaws. ${region}. Datazone handles governance while com. Amazonaws. ${region}. Sts manages role assumption, creating a dependency chain where missing either blocks project initialization. The architecture components

Service TypeEndpoint NameFunction
Interfacecom.amazonaws.${region}.glueData Catalog metadata
Interfacecom.amazonaws.${region}.stsIAM role assumption
Gatewaycom.amazonaws.${region}.s3Dataset and notebook storage
Interfacecom.amazonaws.${region}.sagemaker.apiAPI control signals

Deployment fails if the VPC lacks at least two private subnets across distinct Availability Zones, as Redshift Serverless requires this distribution for high-availability. Each subnet must reserve three free IPs to accommodate ENI allocation during compute provisioning. Endpoint Configuration documentation specifies that communication occurs entirely within the AWS backbone, eliminating NAT Gateway requirements. The cost implication involves hourly charges per interface endpoint, which accumulates rapidly across the twelve mandatory services listed. Omitting com. Amazonaws. ${region}. Sagemaker. Runtime allows domain creation but breaks inference invocation, a silent failure mode detectable only during model testing.

Executing a Compliant VPC Deployment for SageMaker Unified Studio

VPC Creation Parameters for Air-Gapped SageMaker Domains

Select 3 for the Number of Availability Zones and 0 for Number of public subnets. Navigate to Create VPC, choose the VPC and more option, and explicitly zero out public subnet counts to prevent accidental internet gateway attachment.

  1. Enter airgapped as the Name tag auto-generation value.
  2. Retain default settings for IPv4 CIDR blocks and DNS options.
  3. Set Availability Zones to 3 to satisfy high-availability constraints.
  4. Set public subnets to 0 to enforce strict egress control.
  5. Execute the Create VPC command to generate the resource.

This configuration enables the VPC Only mode, which completely eliminates service account internet access unlike default templates. Distributing resources across three Availability Zones introduces a trade-off: immediate consumption of IP space. A /24 CIDR block proves insufficient for production scale, necessitating a /22 or larger allocation from the start. Failure to plan IP capacity for at least five years forces a destructive VPC replacement later.

Copy the Project ID from the SageMaker console to filter security groups for exact endpoint attachment.

  1. Navigate to the VPC console and initiate the Create Endpoint workflow.
  2. Paste the copied identifier into the security group filter to isolate the datazone--dev resource.
  3. Select the matching group name and attach it to the private subnets set in the architecture components.
  4. Repeat this sequence for the STS service to enable role assumption without public routing.

Manual binding prevents the control plane from defaulting to broader network rules that violate isolation policies. Operators ignoring this specific filter risk attaching overly permissive groups that expose the interface endpoint to unintended traffic sources. The cost of AWS PrivateLink Validation requires executing a SQL query within the project context to confirm metadata resolution. Failure at this stage usually indicates a subnet mismatch rather than a service outage. Strict adherence to the project-specific security group ensures the DataZone governance layer remains logically separated from other VPC resources.

Subnet Selection and IAM Identity Center User Initialization

Selecting a minimum of two private subnets prevents immediate domain creation failures during the wizard validation phase. Operators must distribute these resources across multiple zones to satisfy the architecture components. IP capacity planning often ignores the five-year horizon, leading to exhaustion when scaling beyond initial pilot groups.

  1. Confirm private subnets exist in at least two distinct Availability Zones.
  2. Validate that no route tables point to an Internet Gateway.
  3. Prepare a valid corporate email address for the initial IAM Identity Center user.

Entering an incorrect email format halts the user provisioning process, requiring a full domain recreation to resolve. Unlike standard IDEs, this platform inherits VPC settings automatically, making the initial subnet choice irreversible without significant downtime.

ParameterMinimum ValueValidation Goal
Private Subnets2Prevent single-zone failure
Public Subnets0Enforce air-gapped policy
User Email1 ValidEnable admin access

Skipping the email verification step leaves the domain orphaned with no administrative entry point.

Validating Connectivity and Resolving Isolation Issues in Production

Defining Private Networking Constraints for SageMaker VPC Endpoints

Conceptual illustration for Validating Connectivity and Resolving Isolation Issues in Pr
Conceptual illustration for Validating Connectivity and Resolving Isolation Issues in Pr

Disabling auto-assign public IP on subnets forces all egress traffic through VPC endpoints to remain strictly within the AWS backbone. This configuration eliminates direct internet routing but introduces a hard dependency on DNS resolution for service discovery. Operators must enable DNS hostnames so that service names like `com. Amazonaws. ${region}. Sagemaker. Api` resolve to private interface addresses rather than public IPs. Without this setting, applications attempt outbound connections that fail immediately due to the missing default route to an internet gateway.

Configuration SettingRequired ValueFailure Mode if Incorrect
Auto-assign public IPDisabledTraffic leaks to IGW or drops
DNS hostnamesEnabledService names resolve to public IPs
Subnet countMinimum 2Domain creation wizard rejects input

Amazon. A common oversight involves assuming that disabling public IPs alone suffices for isolation; however, without explicit endpoint policies, the control plane cannot validate connectivity in a SageMaker project.

Three Availability Zones provide the necessary redundancy for workgroups using Enhanced VPC Routing to avoid capacity constraints. AWS documentation warns that restricting deployment to constrained Availability Zones triggers insufficient capacity errors and delays application creation times significantly. Operators often misinterpret the minimum requirement of two subnets as sufficient for production, ignoring the risk of single-zone failures during peak training loads. The limitation is clear: two zones meet basic connectivity but fail to sustain high-availability when one zone experiences degraded performance.

Architectural guides published by April 2, 2026, mandate distributing subnets across three Availability Zones. This distribution strategy directly mitigates the risk of project access failures when a specific zone loses connectivity to interface endpoints. Unlike competitor offerings, SageMaker's architecture explicitly supports this multi-zone model to maintain service continuity for generative AI workloads.

Deployment ScopeMinimum AZsRecommended AZsFailure Mode Risk
Non-Production22Moderate
Production EVR23Low
high-availability23Minimal

Implementing defense-in-depth carries a tangible cost: skipping the third zone risks loss of quorum during regional incidents, forcing manual failover procedures that violate compliance SLAs. Validation scripts must confirm DNS hostnames resolve correctly across all selected zones before enabling user profiles. Neglecting this step leaves the VPC routing table incomplete, causing silent drops for traffic destined to SageMaker interface endpoints.

IP Capacity Exhaustion Risks in Five-Year Enterprise Scaling Plans

A /22 CIDR block supports 100 concurrent users by accommodating 511 active IP addresses against a baseline domain consumption of 11 IP addresses. High-Concurrency Enterprise Scenario modeling reveals that each user profile consumes approximately 5 IP addresses, creating a linear growth trajectory that exhausts /24 subnets within months. Operators frequently miscalculate this burn rate by ignoring the compound effect of unique instance types per user and average training instance counts. The cost of undersizing is immediate development halts rather than gradual performance degradation. Engineering teams lose billable hours when capacity errors prevent new notebook kernels from launching.

Subnet SizeMax Users SupportedRisk Horizon
/24~40<6 months
/23~80<18 months
/221005 years

Planning IP capacity for at least 5 years requires projecting the number of apps per user alongside expected growth percentages. IP Planning Specifics dictate that insufficient subnet sizing effectively delays time-to-market by forcing mid-project network re-architecture. The limitation of static CIDR assignment is that expanding a VPC post-deployment often requires tearing down existing resources. InterLIR recommends validating connectivity in a SageMaker project only after confirming free IP headroom exceeds the five-year forecast.

About

Evgeny Sevastyanov serves as the Support Team Leader at InterLIR, a specialized IPv4 marketplace based in Berlin. While his daily work focuses on managing IP address leasing and maintaining clean BGP route objects, his deep expertise in network infrastructure makes him uniquely qualified to discuss Amazon SageMaker Unified Studio networking. Configuring a network-isolated VPC requires a fundamental understanding of IP allocation, subnetting, and secure connectivity principles that Evgeny applies routinely when managing resources in RIPE and APNIC databases. At InterLIR, ensuring security and transparency in IP transactions mirrors the strict compliance needs addressed in this article, such as HIPAA or FedRAMP standards. By using his practical experience with AWS PrivateLink concepts and private network architectures, Evgeny bridges the gap between raw IP resource management and advanced cloud AI security, offering readers a grounded perspective on building reliable, isolated environments for sensitive data initiatives.

Conclusion

Scaling SageMaker Unified Studio reveals a critical fracture point: network re-architecture becomes impossible without service interruption once IP exhaustion hits. The operational debt of undersized subnets manifests not as slow performance, but as an immediate hard stop on kernel launches, directly burning billable engineering hours. As 2026 trends push toward serverless AI and native Bedrock integration within isolated networks, the demand for ephemeral IP addresses will spike unpredictably, rendering static /24 or /23 planning models obsolete within months. Relying on current headroom without accounting for compound growth in instance types guarantees a capacity cliff that delays time-to-market.

Organizations must mandate a /22 CIDR minimum for any production environment targeting a five-year horizon, regardless of current team size. This specific sizing buffers against the linear growth of user profiles and the exponential variance of modern AI workloads. Do not wait for utilization alerts; the window to expand VPC ranges safely closes once initial resources are deployed. Start by auditing your current VPC subnet masks against the five-year user forecast before Friday's sprint planning. If your existing block is smaller than /22, draft a migration proposal immediately to avoid forcing a mid-project network teardown later.

Frequently Asked Questions

The unified interface carries no direct licensing or usage costs for users. You are billed exclusively for underlying AWS services like compute instances and storage volumes consumed during operations.

You must distribute private subnets across at least three Availability Zones for compliance. This mandatory foundation creates a compliant, air-gapped architecture superior to single-zone isolated deployments.

Quick Set Up templates cannot customize security groups before domain creation occurs. Production environments demand pre-configured network firewalls that automated wizards fail to provide for strict compliance.

Traffic routes exclusively over the AWS backbone via AWS PrivateLink instead of public internet. This ensures sensitive data never traverses public infrastructure while maintaining full functionality.

Operators incur hourly charges for each VPC Interface Endpoint plus data processing fees. These specific rates vary by region and apply alongside standard service consumption billing.