You are currently viewing NSX-T multi T0 and Tanzu Deployments

NSX-T multi T0 and Tanzu Deployments

  • Post author:
  • Post category:Tanzu
  • Reading time:13 mins read

General description

If you are utilizing vSphere with Tanzu in conjunction with NSX, you can implement multiple T0 Gateways to further segregate your network traffic. To do this, you must set one T0 Gateway as the default during installation. Later, you can configure additional T0 Gateways and selectively override the default T0 Gateway on a per-vSphere-Namespace-basis.

In a really big environment i found our that if you try do deploy a Tanzu Worker Cluster in an additional Segment, other than the one your Supervisor Cluster reside, they never get starting to work. Your will see that Tanzu start to install but never get any infos back.

Difference between NAT and No-NAT

During both, the installation process and vSphere Namespace creation, you have the option to enable or disable Network Address Translation (NAT). Enabling NAT entails two important considerations:

  1. Specifying an Egress-Network, which must be routed. A dedicated Egress IP is assigned to each vSphere Namespace. When objects such as K8s Nodes or Pods within the vSphere Namespace initiate communication with external entities (DNS, ActiveDirectory, NTP,…), they are source NAT’ed to the dedicated Egress IP (one dedicated IP per vSphere Namespace).
  2. Avoiding the need for Namespace Network Routing: Since the communication is source NAT’ed, there is no requirement to route the entire Namespace Network (which contains the K8s Nodes) outside of NSX. To prevent such routing from happening, a RouteMap is installed on the T0 Gateway (this is done automatically).

The Problem

If you now create a vSphere Namespace on a T0 Gateway, other than the default, and try to create a K8s Cluster, only the first Controlplane VM will be created and it will never finish. Thus, no further nodes will be created and the cluster never finishes either.

> kubectl get cluster,machine,vm -A
NAMESPACE NAME PHASE AGE VERSION
vns-edge-1 cluster.cluster.x-k8s.io/c-edge-1 Provisioned 2d13h v1.24.9+vmware.1
vns-edge-2 cluster.cluster.x-k8s.io/c-edge-2 Provisioned 4m19s v1.24.9+vmware.1
 
NAMESPACE NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
vns-edge-1 machine.cluster.x-k8s.io/c-edge-1-9sphb-7xcbv c-edge-1 c-edge-1-9sphb-7xcbv vsphere://4237b2f8-926a-e5b8-de9d-8a94a5e12748 Running 2d13h v1.24.9+vmware.1
vns-edge-1 machine.cluster.x-k8s.io/c-edge-1-np1-fxcwf-5c8cf77545-ks95v c-edge-1 c-edge-1-np1-fxcwf-5c8cf77545-ks95v vsphere://4237e712-fc19-b3b4-1d26-fd4c3f53e269 Running 2d13h v1.24.9+vmware.1
vns-edge-2 machine.cluster.x-k8s.io/c-edge-2-4fmhw-fdgsl c-edge-2 vsphere://4237b573-f30d-9348-d82e-d3ad765a346d Provisioned 4m4s v1.24.9+vmware.1
vns-edge-2 machine.cluster.x-k8s.io/c-edge-2-np1-gz5np-fdcbc5cf6-gv8td c-edge-2 Pending 4m9s v1.24.9+vmware.1
 
NAMESPACE NAME POWER-STATE AGE
vns-edge-1 virtualmachine.vmoperator.vmware.com/c-edge-1-9sphb-7xcbv poweredOn 2d13h
vns-edge-1 virtualmachine.vmoperator.vmware.com/c-edge-1-np1-fxcwf-5c8cf77545-ks95v poweredOn 2d13h
vns-edge-2 virtualmachine.vmoperator.vmware.com/c-edge-2-4fmhw-fdgsl poweredOn 4m3s

The Cause

In most cases, enabling NAT for your supervisor cluster is the preferred approach. Routing a large network such as the Namespace network is a rather rare scenario.
However, when creating a K8s cluster (kind cluster or TanzuKubernetesCluster) specific controllers such as tkg-controller or capv need to establish connections to the K8s cluster. These controllers operate as Pods running on the SupervisorClusterControlePlaneNodes. And they utilize a vNIC connected to NSX within one of the Namespace Networks.
When network routing is not enabled (due to NAT being disabled), these controllers are unable to establish connections to the K8s clusters residing on another T0 Gateway.

The “to-easy-fix”

Based on our current understanding, one might suggest disabling NAT on the Supervisor during the installation process. While this approach may seem obvious, it is worth noting that in vSphere 8U1 (haven’t tested other version), this is not working. If NAT is disabled during installation, the wizard fails to validate the provided input and instead generates a generic error message:

Invalid field ‘workloads’ in structure ‘com.vmware.vcenter.namespace_management.supervisor.enable_on_compute_cluster_spec’

The real fix (maybe?)

Another approach is, to remove/modify the RouteMap. This comes with another challenge. Since this RouteMapwas created during installation, it is owned by a NSX superuser, hence it can’t be deleted from the GUI.

nsx-protected-object

Instead you have to delete it via API.
The first step is to get the current RouteMapconfiguration as JSON

❯ curl –insecure -u admin:‘Password123!’ -X GET “https://nsxt-1.vraccoon.lab/policy/api/v1/infra/tier-0s/T0-Tanzu-1/route-maps”
{
“results” : [ {
“entries” : [ {
“prefix_list_matches” : [ “/infra/tier-0s/T0-Tanzu-1/prefix-lists/pl_domain-c1006:5c194008-2cf0-467f-af69-6786a1daf1c6_deny_t1_subnets” ],
“action” : “DENY”
}, {
“prefix_list_matches” : [ “/infra/tier-0s/T0-Tanzu-1/prefix-lists/prefixlist-out-default” ],
“action” : “PERMIT”
} ],
“resource_type” : “Tier0RouteMap”,
“id” : “rm_deny_t1_subnets”,
“display_name” : “rm-deny-t1-subnets”,
“tags” : [ {
“scope” : “ncp/created_for”,
“tag” : “ncp/subnets_deny_rule”
} ],
“path” : “/infra/tier-0s/T0-Tanzu-1/route-maps/rm_deny_t1_subnets”,
“relative_path” : “rm_deny_t1_subnets”,
“parent_path” : “/infra/tier-0s/T0-Tanzu-1”,
“remote_path” : “”,
“unique_id” : “c1b634d8-de7d-43ac-b538-6207e6a0f3a1”,
“realization_id” : “c1b634d8-de7d-43ac-b538-6207e6a0f3a1”,
“owner_id” : “2b6402b9-75bf-4b9e-b93f-57791008257c”,
“origin_site_id” : “2b6402b9-75bf-4b9e-b93f-57791008257c”,
“marked_for_delete” : false,
“overridden” : false,
“_system_owned” : false,
“_protection” : “REQUIRE_OVERRIDE”,
“_create_time” : 1688406759421,
“_create_user” : “wcp-cluster-user-5ce7261a-4769-4a09-a459-b655fa948c1f-544e1c9b-9ac4-41c5-ab64-1ef0a72ca3ff”,
“_last_modified_time” : 1688406759421,
“_last_modified_user” : “wcp-cluster-user-5ce7261a-4769-4a09-a459-b655fa948c1f-544e1c9b-9ac4-41c5-ab64-1ef0a72ca3ff”,
“_revision” : 0
} ],
“result_count” : 1,
“sort_by” : “display_name”,
“sort_ascending” : true
}

Line 5-6 has the prefix list from earlier listed. Whatever networks are listed there, will be denied from beeing advertised. Thus, we will remove this prefixlist from this RouteMap.
To do so, I’ll save the output, remove the lines and reupload it.

❯ curl –insecure -u admin:‘Password123!’ -X GET “https://nsxt-1.vraccoon.lab/policy/api/v1/infra/tier-0s/T0-Tanzu-1/route-maps/rm_deny_t1_subnets” > rm_deny_t1_subnets.json
 
❯ sed -i ‘3,5d’ rm_deny_t1_subnets.json
 
❯ cat rm_deny_t1_subnets.json
{
“entries” : [ {
“prefix_list_matches” : [ “/infra/tier-0s/T0-Tanzu-1/prefix-lists/prefixlist-out-default” ],
“action” : “PERMIT”
} ],
“resource_type” : “Tier0RouteMap”,
“id” : “rm_deny_t1_subnets”,
“display_name” : “rm-deny-t1-subnets”,
“tags” : [ {
“scope” : “ncp/created_for”,
“tag” : “ncp/subnets_deny_rule”
} ],
“path” : “/infra/tier-0s/T0-Tanzu-1/route-maps/rm_deny_t1_subnets”,
“relative_path” : “rm_deny_t1_subnets”,
“parent_path” : “/infra/tier-0s/T0-Tanzu-1”,
“remote_path” : “”,
“unique_id” : “c1b634d8-de7d-43ac-b538-6207e6a0f3a1”,
“realization_id” : “c1b634d8-de7d-43ac-b538-6207e6a0f3a1”,
“owner_id” : “2b6402b9-75bf-4b9e-b93f-57791008257c”,
“origin_site_id” : “2b6402b9-75bf-4b9e-b93f-57791008257c”,
“marked_for_delete” : false,
“overridden” : false,
“_system_owned” : false,
“_protection” : “REQUIRE_OVERRIDE”,
“_create_time” : 1688406759421,
“_create_user” : “wcp-cluster-user-5ce7261a-4769-4a09-a459-b655fa948c1f-544e1c9b-9ac4-41c5-ab64-1ef0a72ca3ff”,
“_last_modified_time” : 1688406759421,
“_last_modified_user” : “wcp-cluster-user-5ce7261a-4769-4a09-a459-b655fa948c1f-544e1c9b-9ac4-41c5-ab64-1ef0a72ca3ff”,
“_revision” : 0
}

Now we can re-apply the the RouteMap:

❯ curl –insecure -u admin:’Password123!’ -H “X-Allow-Overwrite:true” -H “Content-Type: application/json” –data @rm_deny_t1_subnets.json -X PUT “https://nsxt-1.vraccoon.lab/policy/api/v1/infra/tier-0s/T0-Tanzu-1/route-maps/rm_deny_t1_subnets.json”

NOTE 1: Overwriting the automagically installed RouteMap is probably not supported.
Also, if I’d really do this in production, I’d probably modify the prefixlist to only propagate the NSX based SupervisorControlPlane Segment.

It might now take a few minutes, but the cluster provisioning will continue eventually.

❯ kubectl get cluster,machine,vm -A
NAMESPACE NAME PHASE AGE VERSION
vns-edge-1 cluster.cluster.x-k8s.io/c-edge-1 Provisioned 2d14h v1.24.9+vmware.1
vns-edge-2 cluster.cluster.x-k8s.io/c-edge-2 Provisioned 61m v1.24.9+vmware.1
 
NAMESPACE NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
vns-edge-1 machine.cluster.x-k8s.io/c-edge-1-9sphb-7xcbv c-edge-1 c-edge-1-9sphb-7xcbv vsphere://4237b2f8-926a-e5b8-de9d-8a94a5e12748 Running 2d14h v1.24.9+vmware.1
vns-edge-1 machine.cluster.x-k8s.io/c-edge-1-np1-fxcwf-5c8cf77545-ks95v c-edge-1 c-edge-1-np1-fxcwf-5c8cf77545-ks95v vsphere://4237e712-fc19-b3b4-1d26-fd4c3f53e269 Running 2d14h v1.24.9+vmware.1
vns-edge-2 machine.cluster.x-k8s.io/c-edge-2-4fmhw-fdgsl c-edge-2 c-edge-2-4fmhw-fdgsl vsphere://4237b573-f30d-9348-d82e-d3ad765a346d Running 61m v1.24.9+vmware.1
vns-edge-2 machine.cluster.x-k8s.io/c-edge-2-np1-gz5np-fdcbc5cf6-gv8td c-edge-2 c-edge-2-np1-gz5np-fdcbc5cf6-gv8td vsphere://42372d8a-3ca6-a160-2246-a8b44b324df2 Running 61m v1.24.9+vmware.1
 
NAMESPACE NAME POWER-STATE AGE
vns-edge-1 virtualmachine.vmoperator.vmware.com/c-edge-1-9sphb-7xcbv poweredOn 2d14h
vns-edge-1 virtualmachine.vmoperator.vmware.com/c-edge-1-np1-fxcwf-5c8cf77545-ks95v poweredOn 2d14h
vns-edge-2 virtualmachine.vmoperator.vmware.com/c-edge-2-4fmhw-fdgsl poweredOn 61m
vns-edge-2 virtualmachine.vmoperator.vmware.com/c-edge-2-np1-gz5np-fdcbc5cf6-gv8td poweredOn 7m44s

To check connectivity, you can try to ping the corresponding segment’s gateway IP. The K8s nodes themself do not reply on icmp requests. But you could do port scans e.g. with netcat.

NOTE 2: There is a prefixlist on your T0 Gateway called like “pl-domain-<clusterMOBID>:<supervisor-uuid>-deny-t1-subnets”. Creating a vSphere Namespace with NAT enabled, adds an entry (containing the Namespace Network) to that prefixlist. This prefixlist is then used as matching criteria on a route-map “rm-deny-t1-subnets” to prevent these networks from being advertised.
For some reasons, these entries will always be added to the default T0 Gateway prefixlist (set up during Supervisor installation). Which means, that Namespace Networks from vSphere Namespaces on other T0 Gateways (by using “Override Supervisor network settings“) will always be advertised, unless you create a route map manually. This is probably a bug.

 

The original Post was from a good friend, vracoon, and can be found here.