Reducing Complexity and Simplifying Connectivity When Creating One Outbound Point on AKS
Less than 20 companies around the world are certified as Kubernetes on Azure Expert MSPs, and we love being part of that exclusive list! However, despite our exhaustive knowledge of the cloud, occasionally we come across a customer problem that doesn’t have a mature solution in Azure. That’s when we have to get our thinking caps on!
That’s what happened recently with a customer challenge on AKS, Azure Kubernetes Service. If you’re using multiple AKS clusters and you want to separate VNet for each cluster, all with a single outbound IP, this one is for you!
Understanding the need to create one outbound IP
When all of a customer’s clusters are in the same VNet, outbound access is not a problem, as you can use NAT Gateway. However, on Azure, NAT Gateway is not compatible with multiple VNets. As the best practice when working with AKS is to separate each cluster into different VNets, this can cause a problem when working with Azure NAT Gateway. For example, if you need access to an external API which is protected by a firewall solution, you will need to open each outbound IP in the firewall individually to access the protected API. As your environments scale and grow, this can cause a huge amount of complexity and manual work, not to mention that you could end up reaching the limit of rules in the firewall faster.
There is currently no official recommendation from Microsoft on how to overcome this challenge with more than a single VNet, and so we set about creating a solution for our customers, connecting many different components within Azure together to engineer something unique.
The solution from 2bcloud
After recognizing the limitations of using NAT Gateway on multiple VNets, we set about creating a hub and spoke architecture solution. This would allow multiple spoke VNets to access the same hub, and eventually to use the one (hub) NAT Gateway.
The spoke VNet will access the internet, and each cluster will connect to the same VNet via peering, routing all the traffic to this single hub, and then on to the internet. With this in place, all the clusters would have a single IP when accessing the internet.
A company might have 50 clusters in every region, and then 40 VMs within each cluster, so you can immediately see the benefits of consolidating, streamlining and centralizing the process. With this hub and spoke methodology, all the clusters would be able to communicate with the internet, using one single IP.
Within this solution, we needed to create a custom route in the spoke VNet, using Azure route table to a virtual appliance in the hub VNet (this virtual appliance has to have a private IP). Note: Azure recommended that the virtual appliance should be the Azure firewall solution. Virtual Appliance- is a VM that is completely dedicated for a single task, such as a firewall (like Sophos), routing (like NGINX), or otherwise. In our case, we’re talking about a firewall.
The first custom route we need to add to the route table is 0.0.0.0/0 > to the virtual appliance private IP.
Because all the outbound and inbound connections to the AKS clusters are now going through the Azure firewall, it creates a problem called “Asymmetric routing”. This problem occurs when a packet takes one route to its destination and then a different one when returning to the source. Incoming traffic uses the public IP address because you’re using the public load balancer, but on return, the private IP from the firewall is used instead.
To solve this, you now need to create new public IPs for DNAT (Destination network address translation) rules (in addition to your application), attach this to the firewall and add DNAT rules to translate the public IPs you just created to your application’s public IP.
In Azure Firewall, when working with AKS you need to open a specific set of ports, services and networking configurations in order for the AKS to function correctly.
In addition to the AKS specific port, Azure firewall allowed us to create unique prioritized rules for DNAT, as well as at the application level (HTTP, HTTPS, MSSQL, etc) and networking (TCP/UDP) level.
Because we created multiple public IPs to the firewall, now all the clusters can access the internet using one of them, but we only want to have one public IP. We solve that by attaching an Azure NAT gateway to the firewall (hub VNet). The last thing we need to make sure is to add a route in the route table, saying that the next hop of the public NAT gateway is the internet.
When creating an AKS cluster with this setup, you need to make sure that the identity (SPN/ MSI) has permission to create routes on the route table (for every cluster you need to create a VNet, 2 public IPs, 1 for the app and 1 for the firewall, plus a designated route table).
The only shared resources between the clusters are the firewall and NAT gateway.
This is how it will look all together:
The benefits for today’s AKS customers
It’s important to take some time to consider your options on AKS, as Azure Firewall is a costly solution, even without any modifications or engineering involved. However, when implemented with best practices, a hub and spoke solution eliminates the complexity of managing a scaling Azure cloud environment, allowing you to send all traffic through a secure, available, and highly-performant hub, connected to all clusters via VNet peering.
As a totally automated managed service, when the customer creates new clusters they immediately get the value of this solution, including the peering, the firewall rules, and better visibility and control over all edge connections and third-party relationships.