Deploying Workloads on vSphere 7 with Tanzu Kubernetes Grid Service (TKGS)

Zercurity
9 min readJan 25, 2021

Following from our last two posts. There are three primary ways of standing up Kubernetes on vSphere. Each with there own benefits and drawbacks. This post will be the second of three looking at VMware’s TKGS.

  • Tanzu Kubernetes Grid (TKG)
    Deploy Kubernetes via Tanzu (TKG) without the need for a licensed Tanzu supervisor cluster. This does not provide a load balancer.
  • Tanzu Kubernetes Grid Service (TKGS)
    Deploy and operate Tanzu Kubernetes clusters natively in vSphere with HA Proxy as the load balancer. Without VMware Harbor as a container repository.

    - Deploying and configuring HA Proxy
    - Deploying workloads via the supervisor cluster (this post)
    - Creating namespaces and initial cluster configuration
  • VMware Tanzu Kubernetes Grid Integrated Edition (TKGI)
    Fully featured Tanzu deployment with NSX-T.

    - Deploying and configuring NSX-T
    - Deploying workloads via the supervisor cluster
    - Creating namespaces and initial cluster configuration

Prerequisites

If you haven’t already please read through our first post on TKGS as it provides a lot of detail on what TKGS is and the configuration we’ll be using for our deployment.

Deploying TKGS Workloads

No that we’ve stood up and successfully configured and tested our HA Proxy. The next stage is to deploy our supervisor cluster. This will manage subsequent Kubernetes clusters. Managing and orchestration, deployment, and management of TKGS clusters.

vSphere workload Management setup

Head on over to your vSphere dashboard. Under shortcuts you’ll see Workload management. When you click on this link. You’ll be presented with a few options. If you see the example below. Your organization is already licensed for Tanzu supervisor clusters. This is a new licensing model separate to the commented model of ESX + Kubernetes. You will require one of these new licenses.

During this transition period however, VMware does provide a 60 day trial. Which you can use freely. Post expiration your cluster will continue to function. Though all management functionality will be disabled.

Once you have the following screen below. Ensure that NSX-T option is unavailable. If NSX-T is available you may want to deploy Tanzu on NSX-T. This is the TKGI version of Tanzu. Which is the preferred method of installation given the additional support and functionality included with NSX-T.

Installing TKGS on vCenter Server Network (Supports Tanzu Kubernetes clusters)

Once you’ve chosen the vCenter Server network as your installation option. The Workload management screen will prompt you to choose a destination cluster for installation. This will perform pre-flight checks.

If no cluster is available choose the incompatible tab so see the reason given for the incompatibility in order to resolve it. This will most likely be a licensing issue.

Pre-flight Tanzu checks

The control plane size depends on the size of your deployment. In most cases you’ll be fine will a Small VM. Three VMs will be deployed with the specification shown. This will form the supervisor cluster.

Supervisor cluster sizing options

Right, onto the next step which is to choose the storage policy we created earlier. Which was a clone of the default vSphere storage policy. Choose your Tanzu storage policy from the list and click next.

This load balancer section refers to both your management IP address for your HA Proxy server and the load balancer IP addresses we defined for our frontend network. For reference here they are again:

  • Management network (VLAN 26)
    Management IP:
    10.64.2.10
  • Frontend network (VLAN 28)
    Load balancer address: 10.64.1.1–10.64.1.254 (10.64.1.0/24)

It is critically important that the information entered here reflects that defined in our HA Proxy configuration. Otherwise, you’ll run into all sorts of obscure setup errors.

  • Name
    This is the DNS name of our HA Proxy. This will be the first part of the host name you entered during the HA Proxy setup.
  • Type
    Only HA Proxy is available at the moment — so HA Proxy it is.
  • Data plane API addresse(s)
    This is the IP address of the management IP address you provided. Including the data plane management API Port which defaults to 5556.
  • Username
    During the last stage of the HA Proxy setup you provided and username and password. This username will be whatever you provided in the HA Proxy User ID field.
  • Password
    This will be the password for the above information.
  • IP Address ranges for virtual servers
    These are the IP address you provided as your Load balancer IP ranges. Which you’ll have provided as a subnet. This field requires them to be provided as a range. In our HA Proxy we provided 10.64.1.0/24. So we’re going to provide: 10.64.1.1–10.64.1.254. Its very important that this address range does not overlap with any other services or the HA Proxy server itself.
  • Server certificate authority
    This is the certificate authority you provided at the beginning of the HA proxy setup. If you elected to have you generated automatically for you. Then you’ll need to grab it from the server. You can either download it the certificate by visiting the management server and downloading the PEM via Firefox. Or you can fetch it from the HA Proxy server like so:
scp root@10.64.2.10:/etc/haproxy/ca.crt ca.crt && cat ca.crt
Configuring HA proxy within the workload manager

The next stage is just to re-provide information about your management cluster.

  • Network
    This is the management virtual distributed switch we created on VLAN 26.
  • Starting IP Address
    Please ensure DHCP is not running on the management network or the IP address have been reserved. The starting IP address is fixed IP address from which the supervisor VMs will be assigned. As there are three by default 10.64.2.50–10.64.2.52 will be allocated.
  • Subnet Mask
    Your management networks subnet mask (255.255.254.0).
  • Gateway server
    Your management networks gateway.
  • DNS server
    Your management networks DNS server or your site wide DNS server.
  • DNS Search domains
    Your DNS domain. This says it optional but I would provide one as we’ve found it improves the setup process and mitigates a few issues (see troubleshooting).
  • NTP server
    Try and provide a local NTP server. Or at least use the NTP servers as configured by your ESX hosts.
Configuring the management network

Almost there. The last step now is to configure the workload network itself. I’d leave the IP address for services alone unless you’re going to be deploying a large cluster. This IP range is completely internal to Kubernetes and does not reflect any configuration in HA Proxy or the rest of your network.

The DNS server will however, need to either be your workload networks DNS server or your site wide DNS server.

Workload network (VLAN 27)
Management IP:
10.64.8.10
Management Gateway:
10.64.8.1
Network:
10.64.8.0/21
Subnet:
255.255.248.0 (2046 IPs available)
Workload starting address: 10.64.8.50–10.64.15.150

Next we just need to define our workload network. To do this use the ADD button above the table. Once we’ve filled everything in it’ll look as follows (the workload network configuration dialogue is documented below):

Workload configuration screen for Tanzu TGKS

The workload network screen. Just as we’ve done in prior steps we just need to re-enter our details.

  • Name
    You can give it any name you like. However, it must be alpha numeric.
  • Port group (V27 Tanzu workload)
    Select our Tanzu workload port group. The same one HA Proxy uses.
  • Gateway (10.64.8.1)
    Your Tanzu workload gateway IP address.
  • Subnet (255.255.248.0)
    Your Tanzu workload subnet 255.255.248.0 (2046 IPs available.
  • IP Address ranges (10.64.8.50–10.64.15.150)
    These will be the entire address range you want to be used for VMs being provisioned by the supervisor cluster. These will form your Kubernetes worker nodes.

Click next and you’ll see the workload network appear in the table. You can be fine multiple networks. Especially if you have smaller subnets that you need to define within a congested subnet.

Defining our workload cluster

Finally choose our Tanzu Content library. This will provided the installation process with machine templates required for standing up the supervisor cluster.

Tanzu content library

Hey presto! Review and then re-review all your settings and check they match against those in our HA Proxy. Once the process starts it can take up-to 30 minutes to complete. So measure twice, cut once. :)

Tanzu setup finalization

Finally click finish and grab a beer.

Tanzu supervisor cluster installation

As the installation kicks off you will see a load of warnings and errors like the one here. This is normal and the setup process will re-try the actions. These errors are usually whilst the system waits for another component to complete is setup. So just be patient the setup can take 15–30 minutes.

Tanzu errors during setup
Resource Type Deployment, Identifier vmware-system-netop/vmwa-resystem-netop-controller-manager is not found.

This error is awaiting the vmware-system-netop image to be pulled and start running. If this issue persists after 30 minutes it is most likely there is a network configuration issue from the management network. Whilst the supervisor cluster is trying to start. Check traffic is routeable on the management network and the DNS server is reachable.

No resources of type Pod exist for cluster domain-c8
Kubernetes cluster heath endpoint problem at <IP unassigned>. Details: Waiting for API Master IP assignment.

This is a common whilst the VM spins up pods for the supervisor pod.

Once you can see three healthy looking supervisor VMs and the Namespaces overview showing green. You’ll be ready to deploy your first namespace. Which will contain the kubernetes cluster itself. That you can deploy workloads too. Which we’ll cover in our next post.

Our working Tanzu cluster.

Connecting to our new supervisor cluster

Lastly, to check everything is in order. Copy and paste the IP address visible from the Overview pane. It will lead you to a page like so:

Kubernetes CLI tools landing page

This is our HA Proxy landing page which has automatically be configured to point to the new supervisor clusters. You can ssh onto the HA Proxy node and check the HA Proxy configuration in /etc/haproxy/haproxy.cfg to see whats going on.

Troubleshooting

If you can’t visit this page. Then first check to see if HA Proxy has been configured. If you still can’t reach the page check out our HA Proxy troubleshooting steps here. If that still leads to no success you can double check the configuration of your supervisor cluster from the Configuration page of your VMware cluster.

Tanzu configuration settings overview

These settings must match those set in HA Proxy. If there is a miss-match then you’ll have to start over. If you’re still stuck. Then try deploying everything onto one subnet. Get that working and then move on from there.

You can also check the supervisor logs under the monitoring section (Kubernetes Events). These only show events within the past hour.

Kubernetes events

Connecting to the supervisor cluster

Once you’ve download the tool set which is available on Windows, Mac and Linux. Linux I had to guess the URL manually:

wget https://10.64.32.1/wcp/plugin/linux-amd64/vsphere-plugin.zip

Copy these files into your /usr/local/bin/ directory. Ensuring they’re executable chmod +x /usr/local/bin/kubectl /usr/local/bin/kubectl-vsphere .

$ kubectl vsphere login --server=10.64.32.1 --insecure-skip-tls-verify

Username: administrator@test.corp
Password:
Logged in successfully.

You have access to the following contexts:
10.64.32.1

If the context you wish to use is not in this list, you may need to try logging in again later, or contact your cluster administrator.

To change context, use `kubectl config use-context <workload name>`

Now you’ll have three contexts to choose from. Once you’ve created a few namespaces they’ll also appear here. Though for now:

$ kubectl config use-context 10.64.32.1

You can then check the status of the supervisor cluster:

$ kubectl get nodesNAME              STATUS   ROLES    AGE     VERSION
420f1f569680d.. Ready master 5d17h v1.18.2-6+38ac483e736488
420fdbfa2080b.. Ready master 5d17h v1.18.2-6+38ac483e736488
420fe058fd6bd.. Ready master 5d17h v1.18.2-6+38ac483e736488

You can also checkout all of VMware Tanzu’s available APIs like so helpful for debugging and checking the status of nodes:

kubectl api-resources --namespaced=false

Its all over!

Hopefully that’s given you a quick dive into standing up TKGS on vSphere. In our next post we’ll be looking at creating our first namespace. Please feel free to get in touch if you have any questions.

--

--