Route leak lessons: What the 25-minute Cloudflare outage...

Blog 15 min read

A single automation error triggered a 25-minute BGP route leak that disrupted IPv6 traffic across Cloudflare's Miami infrastructure on January 22, 2026. Cloudflare's route leak incident january 22 2026 This incident highlights that despite industry progress, manual configuration errors remain a critical vulnerability in global routing stability. Readers will learn the precise mechanics of RFC 7908 violations, analyze the specific AS path anomalies involving AS13335, and explore how RFC 9234 adoption offers a viable path toward automated prevention.

The event, documented by Cloudflare engineer Bryton Herdes, resulted from a merged code change that inadvertently advertised peer routes to upstream providers, creating a classic "valley-free" routing violation. While LACNIC reports a decline in such incidents due to rising RFC 9234 deployment, this outage proves that legacy policy gaps still allow simple mistakes to cascade into widespread congestion and packet loss. The failure specifically targeted IPv6 prefixes, funneling external traffic through filters designed only for customer flows, ultimately discarding valid packets.

Understanding the anatomy of this failure requires moving beyond basic definitions to examine how modern network automation can both cause and cure these outages. By dissecting the timeline from the initial code merge at 19:52 UTC to the detection window between 20:24 and 20:30 UTC, we reveal the narrow margin for error in inter-domain routing. The discussion concludes with actionable strategies for implementing role-based updates that prevent routers from accepting or propagating illegitimate paths before they impact the wider internet.

Defining BGP Route Leaks Through the Lens of RFC 7908 Standards

RFC 7908 Definition of BGP Route Leaks and Valley-Free Violations

RFC7908, published in June 2016 according to Key dates data, defines a BGP route leak as an event where an Autonomous System appears unexpectedly in an AS path. This anomaly occurs when a network tells the broader Internet to send it traffic that it is not supposed to forward, per BGP Route Leaks Definition and Context data. The violation specifically breaks valley-free routing principles, which mandate that traffic received from a peer must never be advertised upstream to a provider.

DirectionSource ASReceiver ASValid Action
Customer → PeerAS64501AS64502Accept
Peer → ProviderAS64502AS64503Reject
Provider → PeerAS64503AS64502Accept

Operational risk spikes when the leaker, such as AS64502 in standard models, lacks firewall filters to discard this unsolicited volume. Cloudflare detection algorithms apply the valley-free model to estimate leak likelihood by identifying these specific policy violations. Detecting the leak does not prevent the initial congestion or packet loss on unprepared infrastructure. Operators must implement strict egress filtering because reliance on downstream rejection fails during high-volume surges. The January 22 incident manifested as a Type 3 and Type 4 valley-free routing violation per RFC7908 standards. BGP Route Leaks Definition and Context data shows updates flowing from AS64501 to peer AS64502, then unexpectedly upstream to provider AS64503. This specific topology breach forces the intermediate node to carry transit traffic it cannot process. Cloudflare admitted the automation error pushed internal IPv6 prefixes to external peers in Miami. According to Key numbers, the event lasted 25 minutes before manual reversion stopped the advertisements. The immediate operational consequence was severe congestion on backbone links within the affected data center. As reported by Key numbers, 12Gbps of traffic faced discard due to missing firewall filters.

Leak DirectionSource RoleReceiver RoleOutcome
Peer to ProviderAS64502AS64503Traffic Rejection
Provider to PeerAS64503AS64502Normal Flow

Assumptions about peer link safety for broad exports often prove false. This case proves policy automation requires strict guardrails. Agile configuration deployment conflicts with the absolute need for static safety checks on export policies. A single diff removing a prefix list can expose an entire network to unwarranted ingress loads. Redundancy costs remain negligible compared to the reputational damage of a global routing event. Network teams must treat BGP policy code with the same rigor as security-critical infrastructure.

per Distinguishing BGP Route Leaks from Hijacking Using AS Path Anomalies

Key numbers, the April 2021 Vodafone incident hijacked over 30,000 prefixes, distinguishing unauthorized origin claims from policy violations. A BGP hijack asserts false ownership of address space, whereas a route leak preserves valid origins but violates propagation rules set in RFC7908. Based on BGP Route Leaks Definition and Context, leaks occur when an Autonomous System appears unexpectedly in an AS path, breaking valley-free constraints without claiming prefix ownership. Cloudflare detection algorithm primarily uses the valley-free model to estimate the likelihood of a route leak based on these directional anomalies. The operational distinction dictates response: hijacks require RPKI ROA rejection, while leaks demand strict peer filtering to stop upstream propagation. Distinguishing these events in real-time remains difficult because both manifest as sudden traffic surges on interconnect links. Operators face a tension between aggressive filtering that risks blackholing legitimate traffic and permissive policies that allow leakage to persist. Most networks lack the telemetry depth to instantly differentiate a malicious origin claim from a benign configuration error by a peer. This ambiguity forces reliance on external validation sources rather than local path analysis alone.

FeatureBGP HijackBGP Route Leak
Origin ValidityInvalid/UnauthorizedValid/Authorized
Primary ViolationOwnership ClaimPropagation Policy
MitigationRPKI ROV RejectPeer Filtering
Intent SignalMalicious TakeoverAutomation Error
Origin ClaimFalse ownership assertedValid origin preserved
Path BehaviorDirect insertionUpstream propagation
MitigationRPKI ROA rejectionPeer filtering
Detection SourceOrigin validationDirectional anomaly
FrequencyRareCommon

Anatomy of the January 22 2026 Cloudflare Routing Failure

How Removing prefix-list 6-BOG04-SITE-LOCAL Caused the Miami Route Leak

The January 22, 2026 Cloudflare route leak originated when automation removed `prefix-list 6-BOG04-SITE-LOCAL` from export policies like `6-COGENT-ACCEPT-EXPORT`. This diff operation deleted the sole filter restricting advertisements to specific Bogotá prefixes, leaving the policy-statement logic dependent only on a broad `route-type internal` match. Consequently, the router marked every internal IPv6 prefix as acceptable for external advertisement rather than just the intended subset. The mechanism failed because the prefix-list acted as the only negative constraint; its absence created a default-permit state for all local routes.

Policy ComponentPre-Incident StatePost-Incident State
Match Condition`prefix-list 6-BOG04-SITE-LOCAL`None (Implicit All)
Route ScopeBogotá Data Center OnlyAll Internal Routes
Export ActionAccept SpecificAccept All Internal

However, this configuration pattern assumes operators will never remove a restrictive list without adding a replacement, a assumption that proved fatal here. The implication for network architects is that export policies requiring multiple conditions create single points of failure when one condition is deleted. Relying on the presence of a prefix list to prevent leaks introduces a silent failure mode where deletion equals permission.

The Cloudflare blog records the initial code merge triggering the bug at 19:52 UTC on January 22, 2026. This timestamp marks the entry of a logic error into the network automation repository that would later remove necessary prefix-list filters. The defect remained dormant within the version control system until the next scheduled execution window.

Automation triggered the faulty configuration on a single Miami edge-router at 20:25 UTC, instantly advertising internal routes to upstream providers. Research Data confirms detection tools identified anomalous AS path patterns involving AS13335 and AS32934 between 20:24:00Z and 20:30:00Z.

  1. The router stripped valley-free constraints from export policies.
  2. Internal IPv6 prefixes were advertised to peer and provider sessions.
  3. External networks began forwarding traffic toward the Miami data center.
  4. Firewall filters discarded unauthorized flows, creating immediate congestion.

Network operators manually reverted the configuration at 20:50 UTC, halting the leak after approximately 25 minutes of exposure. The cost of this manual intervention is the delay inherent in human verification; automated rollback systems could have reduced the window significantly but risk false positives. The incident highlights a tension between deployment speed and safety checks in network automation pipelines. Such safeguards prevent permissive diffs from reaching live routers without operator oversight.

Backbone Congestion and Firewall Discards: according to Impact of Leaked Internal Prefixes

Impact Assessment, leaked traffic caused severe backbone congestion between Miami and Atlanta routers. The mechanism involved external packets entering the network core based on false BGP advertisements, only to encounter firewall filters configured strictly for legitimate Cloudflare service prefixes. These filters, designed to protect internal infrastructure, silently discarded the misrouted flows rather than forwarding them to their unintended destination. This discard action prevented total system collapse but created immediate blackhole conditions for affected external parties.

Failure ModeTrigger ConditionResulting Action
Backbone SaturationExcessive inbound volume from peersLink utilization spikes
Filter MismatchSource IP lacks allow-list tagPacket drop (silent)
Path AsymmetryReturn path via valid routeOne-way traffic flow

Operators troubleshooting similar symptoms in JunOS must examine `show firewall filter` counters for sudden increment spikes on ingress interfaces. A critical limitation exists: standard monitoring often misses these discards because the traffic never reaches application logs. The cost of such permissive export policies is measurable disruption without clear visibility into the dropped volume. This mechanism forces a BGP session to drop all updates if the specific policy object is missing or empty, rather than accepting all routes by standard protocol behavior. The January 22, 2026 incident occurred because removing a single prefix list left a `route-type internal` match too broad, allowing unintended advertisements to propagate globally without a final safety stop.

Configuration StatePolicy BehaviorRisk Profile
No Implicit RejectAccepts all routes on null policyCritical failure during automation errors
Implicit Reject EnabledDrops all routes on null policySafe default; requires explicit allow lists

The cost of this safety net is operational friction, as legitimate new peers require manual policy definition before any traffic flows. Operators cannot rely on dynamic discovery for initial connectivity when this safeguard is active. Most production networks still permit default-accept behaviors on internal peering groups to simplify temporary testing environments. This creates a hidden vulnerability where a script error can instantly expose the entire routing table. Without this constraint, automation scripts retain the ability to create open-ended leaks simply by omitting a filter.

Embedding Automatic Routing Policy Evaluation into CI/as reported by CD Pipelines

Cloudflare, adding automatic routing policy evaluation into CI/CD pipelines detects empty or erroneous policy terms before deployment. The mechanism inserts a validation step that parses policy-statement diffs to verify that removing a prefix-list does not leave a broad `route-type` match as the sole filter condition. This approach catches logic errors where a deletion creates an implicit permit-all state, which occurred when the Bogotá prefix removal exposed internal routes. However, the cost is increased pipeline latency, as complex policy trees require significant compute resources to simulate fully. Operators must balance speed of deployment against the depth of static analysis performed on every commit. Without this check, automation platforms blindly push configuration changes that violate valley-free routing principles. The implication for network teams is clear: code reviews alone cannot catch semantic errors in large-scale BGP policies. Structural validation must become a mandatory gate in the release process. Failure to automate this scrutiny leaves production networks vulnerable to simple syntax oversights. Real-time monitoring catches leaks after they happen, but pipeline evaluation stops them entirely. This shift transforms the CI/CD pipeline from a delivery tool into a primary security boundary.

Checklist for Deploying BGP Community-per Based Export Safeguards

Cloudflare, patching routing policy automation failures requires explicit community-based safeguards to reject provider routes on external exports. This mechanism attaches specific BGP communities to internal prefixes, enabling edge routers to filter advertisements based on origin intent rather than simple prefix matching. Operators must configure export policies that deny any route lacking a validated customer or static community tag before it reaches a transit peer. However, the limitation is operational complexity; maintaining consistent community schemas across diverse vendor platforms introduces configuration drift risks that static analysis tools often miss. The implication for network teams is a mandatory shift toward centralized policy generation where community assignment is inseparable from prefix definition.

Define strict export communities for every customer and peering session type.

  1. Configure default-deny rules on all egress policies lacking valid community matches.
  2. Validate AS path integrity against RFC9234 role attributes before applying community tags.
  3. Audit existing prefix-list entries to remove implicit permit-all conditions.
Policy ElementValidation TargetFailure Consequence
Community MatchVerifies origin intentLeaks internal routes
Role AttributeConfirms peer typeViolates valley-free model
Prefix FilterRestricts address spaceAdvertises bogon ranges

This approach prevents the specific failure mode where removing a single prefix list exposes all internal routes. Automation must verify that no export policy relies solely on broad route-type matches without community confirmation.

Real-Time Detection and Recovery Protocols for BGP Anomalies

RFC 9234 BGP Roles and Valley-Free Model Enforcement

Dashboard showing BGP anomaly metrics including 30,000 monitored prefixes, 67% detection rate improvement, global routing volume trends up to 667.9 billion, and prefix block granularities ranging from 1 to 6 blocks.
Dashboard showing BGP anomaly metrics including 30,000 monitored prefixes, 67% detection rate improvement, global routing volume trends up to 667.9 billion, and prefix block granularities ranging from 1 to 6 blocks.

RFC 9234 introduces BGP Roles in UPDATE messages where an eBGP speaker advertises a Role Capability during peering establishment. Noction. Based on Com, this mechanism encodes neighbor relationships directly into the protocol exchange to prevent unintended path propagation. The valley-free model assumes that traffic should not flow from one provider to another provider without passing through a customer. According to Cloudflare blog, strict adherence to this principle stops provider-to-provider violations before they congest backbone links. However, the limitation is that legacy equipment often strips these new optional capabilities, leaving the session vulnerable to manual configuration drift like the January 2026 Miami incident. Operators must deploy real-time monitors on AS_PATH attributes to detect when a peer violates its declared role by advertising routes learned from another provider. This requirement creates tension between rapid failover logic and strict policy enforcement during outages.

As reported by Cloudflare incident report, operators manually reverted the configuration at 20:50 UTC after detecting AS13335 path anomalies. Real-time monitoring via BGPalerter streams RIPE RIS updates to flag unexpected prefix advertisements immediately during such events. Historical dumps from BGPKit provide forensic context, yet live streaming remains necessary for stopping active leaks before congestion saturates backbone links. Operators must pause automation pipelines instantly upon anomaly detection to prevent the system from re-applying the faulty logic. This manual intervention halts the propagation of invalid routes that violate valley-free principles. However, pausing automation creates a blind spot where legitimate topology changes cannot be deployed until health checks pass. The implication is a mandatory operational protocol requiring immediate script suspension followed by targeted configuration rollback. 1. Identify the specific edge-router emitting invalid AS paths using real-time dashboards.

  1. Execute an immediate pause on the network automation agent for that device.
  2. Revert the local configuration to the last known good state manually.
  3. Verify BGP session stability before resuming any automated policy runs.

The cost of delayed manual reversion exceeds the risk of temporary deployment freezes.

CI/CD Pipeline Checks for Empty or Erroneous Policy Terms

Embedding automatic routing policy evaluation into CI/CD pipelines prevents empty policy terms from reaching production routers. This mechanism parses configuration diffs to detect when removing a prefix-list leaves a broad `route-type` match as the sole filter condition. Machine learning classifiers trained on AS path length and historical behavior offer early warnings by flagging deviations before deployment. A study in Electronics journal notes these models improve experimental accuracy when integrated with real-world scenarios. However, the limitation is increased pipeline latency, as complex policy trees require significant compute resources to simulate fully. Operators must balance deployment speed against the depth of static analysis performed on every commit.

Check TypeTarget ErrorAction Triggered
Diff AnalysisRemoved prefix listsBlock merge if term becomes permissive
Path SimulationValley-free violationAlert on provider-to-provider leaks
Community ValidationMissing tagsReject export lacking intent markers
  1. Parse the git diff for deleted prefix-list references.
  2. Simulate the resulting policy tree for implicit permit-all states.
  3. Block the merge request if any term lacks explicit filters.
  4. Log the specific router target for manual review.

InterLIR recommends enforcing these checks to stop automation errors like the Miami incident. The consequence most overlook is that quicker deployment cycles increase the probability of untested logic errors entering the global table.

About

Vladislava Shadrina Customer Account Manager at InterLIR brings a unique client-focused perspective to the critical discussion of BGP route leaks. While her background spans architecture and design, her daily work at InterLIR involves guiding clients through the complexities of secure IPv4 resource acquisition, where network stability and reputation are paramount. At InterLIR, a Berlin-based marketplace specializing in clean IP assets with verified BGP and Route Objects, Vladislava ensures customers understand the importance of rigorous routing policies. This article analyzes the January 2026 Cloudflare incident to highlight why transparency and security in IP management matter. By connecting real-world routing failures to InterLIR's mission of providing reliable network resources, she illustrates how proper infrastructure hygiene prevents congestion and latency. Her role bridges technical incidents and customer trust, emphasizing that reliable IP address management is essential for maintaining global internet integrity and avoiding the cascading effects of accidental prefix leaks.

Conclusion

Automation errors that push internal prefixes to external peers reveal a critical fragility: manual reversion is no longer a viable safety net at modern deployment scales. While the Miami incident was contained in 25 minutes, the operational cost of such exposure grows exponentially as traffic volumes surge toward terabit thresholds. Relying on human reaction times creates an unacceptable window where discarded traffic and potential hijacks can cripple connectivity. The industry must shift from reactive cleanup to proactive policy enforcement within the code itself.

Organizations should mandate that all BGP configuration changes pass through a blocking CI/CD gate that simulates the full policy tree before any merge occurs. Specifically, if a diff removes a prefix-list reference, the pipeline must instantly reject the commit unless an explicit filter replaces it. This approach eliminates implicit "permit-all" states that often slip past human review during rapid iteration cycles. Do not wait for a major outage to implement this; start enforcing these checks within the next quarter to align with increasing automation velocities.

Start by auditing your current git hooks this week to ensure they parse configuration diffs for deleted prefix-list references. If your pipeline cannot currently simulate the resulting policy tree to detect broad match conditions, you are operating on borrowed time.

Frequently Asked Questions

How much traffic was discarded during the Cloudflare Miami BGP route leak incident?
Missing firewall filters caused significant traffic loss during the event. Key numbers data shows 12Gb of traffic faced discard due to these missing filters on the routers.
What specific routing violation type occurred during the January 2026 Cloudflare automation error?
The incident involved advertising peer routes upstream to providers unexpectedly. This action created a mixture of Type 3 and Type 4 valley-free routing violations per RFC7908 standards.
Why did the leaked IPv6 prefixes cause congestion instead of normal forwarding?
Routers lacked filters to accept the unintended volume of transit traffic. Consequently, valid packets were discarded by filters designed only for specific customer service flows.
How long did the BGP route leak persist before operators manually reverted the change?
The disruption continued until network staff identified and stopped the bad advertisements. Key numbers data shows the event lasted 25 minutes before manual reversion stopped the ads.
What distinguishes this BGP route leak from a traditional prefix hijacking event?
A leak violates propagation rules while keeping valid origins, unlike hijacking which claims false ownership. This preserves origin integrity but breaks valley-free routing principles entirely.
Vladislava Shadrina
Vladislava Shadrina
Customer Account Manager