Understanding NSX distributed routing

Submitted by Robin van Altena on Tue, 12/13/2022 - 16:57
 
 
Follow your favourite author

Leave us your email address and be the first to receive a notification when Robin posts a new blog.

Understanding NSX distributed routing
Tue 13 Dec, 2022
Sometimes the simplest setting can have a big impact. Unfortunately, this is also true for several settings in NSX-T. A few weeks back, I received a call from a former colleague and still a good friend. He was working on a NSX deployment and found something weird that he wasn’t expecting. He saw traffic flowing thru the NSX Edges while there was no reason for it. No Edge services were being used and the traffic was not exiting thru the Tier-1. After a short discussion I pointed him to the setting of selecting the Edge Cluster during creation of a Tier-1 and that selecting an edge-cluster was probably the cause. He tested the setting and wrote some excellent documentation for his customer. After reading the document he wrote, we decided it was too good not to share, hence this blog.
Textarea

So, although I’m writing this blog, it is based on a document Rob van der Wouw wrote for a customer for whom he was building a NSX environment. We wanted to share this because there isn’t much information available when you are looking for this specific question. Therefore we removed the customer specific information, and I added some additional screenshots from our lab to fill in the gaps. Hopefully this blog will help some of you with the question: Why does the traffic flow thru the NSX edge when I’m not using stateful services in my environment?

Before we can answer that question a little introduction is in place. But you can also scroll down to the next chapter about Selecting an Edge cluster during deployment.

The concept and its implementation

One of the more complex, but also very powerful, aspects of the NSX product, is its ability to perform distributed routing. Instead of having to go through an external router to go from one IP subnet to another, one can use the NSX gateways to perform this routing. A main difference between the traditional network and NSX is that such a gateway can be run on each Transport Node (ESXi host or NSX Edge) participating in a NSX fabric. This means that for each NSX overlay segment, a local instance of the Tier-0 or Tier-1 gateway is present on each host. This allows for what we call 'first hop routing'. This means that the routing decision can be taken on the Transport node by the local instance of the Tier-1 gateway and the packet can be directly forwarded to the destination host on its destination overlay network.

When we have two subnets that are connected to the same NSX gateway (can be a Tier-1 or a Tier-0), this gateway can route packets between these subnets without the need for an external gateway device. Since each Transport node runs its own distributed router instance of this gateway, it can perform the routing lookup and make the right decision. It will send the packet directly to the host that runs the destination VM on its corresponding logical segment. So, routing is done on the source host and the rest is just Geneve overlay switching towards the destination host. On the destination host there will also run a distributed router instance of the same gateway. This ensures that the return packets can be routed locally on that host as well and then switched via the Geneve overlay to the source host.

When a VM wants to send a packet to a destination that is not directly connected to its own gateway, it will need to forward the packet to another router that may have the ability to deliver this packet to its destination. This is where tiered routing is usually applied. Within NSX we have Tier-0 and Tier-1 gateways.

Multi-tier routing in NSX
Textarea

An overview of the main differences between a Tier-1 and a Tier-0 gateway is shown in this table:

Table
Textarea

And the last difference between the Tier-0 and Tier-1 is what is interesting, because that is the question we are trying to answer. Why does the traffic flow thru the NSX edge when I’m not using stateful services in my environment?

To understand this, we need to look a little deeper in the Tier-0 or Tier-1.

Some functions, which are stateful in nature - like firewalls or NAT - require that the packets are handled by a single instance of this gateway. This instance is called a 'service router', in contrast to the 'distributed router' component, that is run on all the transport nodes in a fabric. This service router only runs on the NSX edges.

The main differences between a Distributed Router and a Service Router are shown in this diagram:

Differences between service and distributed router components in NSX
Textarea

So, all NSX gateways are in essence the same. There are two flavors: Tier-0 and Tier-1, but they both MUST have a DR component and CAN have a SR component - that always runs on an edge VM in an edge cluster:

Routing components in NSX Gateways
Textarea

So, if you are not using stateful services for the Tier-1 you should not use or create a service routing component. And that is exactly what happens when you select an edge cluster during deployment of a Tier-1 gateway. Because if you do assign it, NSX is going to assume you will be using stateful services and it will automatically instantiate a Tier-1 Gateway Service Router on the NSX Edge.

Selecting an Edge cluster during deployment

Now that we know this. What exactly happens to the flows within your NSX environment when you select an Edge cluster during deployment of a Tier-1 gateway? Especially when you have no intent of using stateful services like the gateway firewall or NAT.

So, what is the difference between selecting the Edge-cluster and leaving it to Not Set?

Selecting an Edge Cluster during deployment
Textarea

When a Tier-1 Gateway Service Router is instantiated on the NSX Edge, this has a major consequence for the forwarding tables in the Tier-1 gateway DR components: a Tier-1 gateway with a SR component will ALWAYS have a default gateway to its SR component and will only route to its own connected segments in a distributed manner. The Tier-1 SR component in turn has its default gateway pointed to the Tier-0 DR to which it is connected. 

Another consequence of this behaviour is that NSX does not instantiate a Tier-0 gateway DR instance on the ESXi Transport nodes anymore, since it is not needed anymore: all the traffic is forwarded to the T1 SR component on the NSX Edge anyway. This is especially relevant in NSX environments where there are multiple Tier-1 gateways connected to a Tier-0 gateway.

Let us demonstrate this with some examples.

In the situation when there is a T1-gateway configured that has NO edge cluster assigned to it, NSX does not instantiate a Tier-1 SR instance on the Edge node, and it will deploy a Tier-0 DR instance on each ESXi Transport node that also runs this 'distributed Tier-1 gateway'. See the left example in the picture below.

If an Edge cluster is selected, the situation changes. NSX then deploys a Tier-1 SR instance on the Edge node and the Tier-0 DR instance isn’t instantiated on the ESXi Transport nodes. See the right example in the picture below.

NO edge cluster selected for the Tier-1                                An Edge cluster selected for the Tier-1

Image
Not selected vs selected an Edge cluster for the Tier-1
Textarea

This can also be seen from the NSX command line. For this I have created 2 Tier-1s attached to the same Tier-0 and the Edge cluster isn’t selected:

Two Tier-1s without the Edge cluster selected
Textarea

From the NSX CLI we can see all three DR (1x Tier-0 and 2x Tier-1) components exist on the ESXi Transport node. Below I have combined the output for get logical-routers from both the Edge and the ESXi Transport node. That way the difference can be seen and the UUIDs can be translated to the component names.

Image
Get logical-routers without the Edge cluster selected
Textarea

Fortunately, we can change the setting after deployment. At least in NSX version 3.2.0.1. So, now we can select an Edge cluster for both the Tier-1s.

Two Tier-1s with the Edge cluster selected
Textarea

If we now run the same commands on both the edge node and the ESXi Transport node we can easily spot the difference. Marked in green is the output without the Edge cluster selected and marked in red is the output with the Edge cluster selected.

Image
Get logical-routers with both options next to eachother
Textarea

As you can clearly see with the Edge clusters selected the SR component for the Tier-1s is created on the Edge nodes. Fortunately, we can easily remove the Edge cluster setting again from the Tier-1, without having to redeploy them. An additional note: during the creation of these screenshots the Gateway firewall was disabled for all Tiers.

Disabled Gateway firewall on the Tiers
Textarea

Wrapping things up

When designing and implementing routing in NSX, one must carefully consider if Tier-1 gateways need to be deployed with services, since this will always mean that the forwarding path in the north-south direction (all traffic that needs to exit the local T1 gateway and must be forwarded to a Tier-0 gateway) will change fundamentally. This can potentially lead to performance degradation with traffic that is essentially east-west traffic within the same datacenter, since the Edge VM (the Tier-1 SR component is responsible for this) now has become a 'hairpin device'. A general rule must be: Do NOT assign an Edge Cluster to a Tier-1 gateway unless absolutely necessary!

Now Rob also made some nice packet walks and packet captures, but this blog is already becoming quite lengthy. So, that’s fuel for another blog. 

Hopefully you have enjoyed reading this blog as much as Rob and I had in creating it. If you have any questions or some cool additions, please leave a comment at the bottom. Thanks for reading.

Tags

Questions, Remarks & Comments

If you have any questions and need more clarification, we are more than happy to dig deeper. Any comments are also appreciated. You can either post it online or send it directly to the author, it’s your choice.
Let us know  

 
 
Questions, Remarks & Comments

Message Robin directly, in order to receive a quick response.

More about RedLogic