Deploying Kubernetes (k8s) on vSphere 7 with Tanzu Kubernetes Grid (TKG)
For a change of gear away from our usual Osquery posts. Over the last year we’ve done a number of Zercurity deployments onto Kubernetes. With the most common being done on-prem with VMware’s vSphere. Fortunately, as of the most recent release of VMware’s vCenter you can easily deploy Kubernetes with VMware’s Tanzu Kubernetes Grid (TKG).
This post will form part of a series of posts on running Zercurity on top of Kubernetes in a production environment.
As with all things, there a number of ways to deploy and manage Kubernetes on VMware. If you’re running the latest release of vCenter (7.0.1.00100) you can actually deploy a TKG cluster straight from the Workload Management screen. Right from the main dashboard which has a full guide to walk you through the setup process. This also works along side NSX-T Data-center edition for additional management functionality and networking.
However, for the purposes of this post and to support older versions of ESX (vSphere 6.7u3 and vSphere 7.0) and vCenter we’re going to be using the TKG client utility which spins up its own simple to use web UI anyway for deploying Kubernetes.
Installing Tanzu Kubernetes Grid (TKG)
Right, first things first. Visit the TKG download page. Its important that you download the following packages appropriate for your client platform (we’ll be using Linux):
Note: You will require a VMware account to download these files.
- VMware Tanzu Kubernetes Grid 1.2.0 CLI
tkgis used to install, manage and upgrade the Kubernetes cluster running on top vCenter.
- VMware Tanzu Kubernetes Grid 1.2.0 OVAs for Kubernetes
In order to deploy the TKG. The setup requires the Photon container image
photon-3-v1.17.3_vmware.2.ova. Used for both the workers and management VMs.
- Kubectl 1.19.1 for VMware Tanzu Kubernetes Grid 1.2.0
kubectlis a command line tool used to administer your Kubernetes cluster from the command line.
- VMware Tanzu Kubernetes Grid Extensions Manifest 1.2.0
Additional services, configuration and RBAC changes that are applied to your cluster post installation.
The following setups are using Ubuntu Linux. Notes will be added for additional platforms.
With the “VMware Tanzu Kubernetes Grid 1.2.0 CLI” archive downloaded. Simply extract the archive and install the
tkg binary into your system or user
PATH. If you’re using Mac OSX you can use the same command below just substitute
sudo mv kubectl-mac-v1.19.1-vmware.2 /usr/local/bin/kubectl
sudo chmod +x /usr/local/bin/kubectl
Docker is required as the TKG installer spins up several docker containers used to connect to and configure the remote vCenter server and its subsequent VMs.
If you’re running Mac OSX you can make use of the Docker Desktop app.
sudo apt-get update
sudo apt-get install apt-transport-https \
ca-certificates curl gnupg-agent software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
Lastly, you’ll need to give your current user permission to interact with the docker daemon.
sudo usermod -aG docker <your-user>
Installing Tanzu (tkg)
tkg is a binary application used to install, upgrade and manage your Kubernetes cluster on top of VMware vSphere.
tar -zxvf tkg-linux-amd64-v1.2.0-vmware.1.tar.gz
sudo mv tkg-linux-amd64-v1.2.0+vmware.1 /usr/local/bin/tkg
sudo chmod +x /usr/local/bin/tkg
Once installed you can run
tkg version to check
tkg is working and installed into your system
Importing the OVA images
From the hosts and clusters view, if you right click on your Data-center you’ll see the option “Deploy OVF template” select the OVA downloaded from the vmware TKG downloads page. We’re using
photon-3-v1.17.3_vmware.2.ova . Then simply follow the on screen steps.
Once the OVA has been imported its deployed as a VM. Do not power on the VM. The last step is to convert it back into a template so it can be used by the TKG installer.
Right click on the imported VM
photon-3-kube-v1.19.1+vmware.2a , select the “Template” menu item and choose “Convert to template”. This will take a few moments and the Template will now be visible under the “VMS and Templates” view.
You many also choose to configure a dedicated network and or resource pool for your k8s cluster. This can be done directly from the vSphere web UI. If you’re configuring a new network please ensure nodes deployed to that network will receive an IP address via DHCP and connect to the internet.
Installing Tanzu Kubernetes Grid
Once all the prerequisites are met launch the
tkg web installer:
tkg init --ui
Once you run the command your browser should automatically open and point your browser to: http://127.0.0.1:8080/
Choose the “VMware vSphere” deploy option. The next series of steps will help configure the TKG deployment.
The first step is to connect to your vSphere vCenter instance with your administrator credentials. Upon clicking “connect” you’ll see your available data-centers show up. TKG also requires your RSA public key, in order for management cluster to use for management.
Your SSH RSA key is usually located within your home directory:
If the file doesn't exist or you need to create a new RSA key you can generate one like so:
ssh-keygen -t rsa -C "email@example.com"
If you change the default filename you’ll see two files created, once the command has run. You need to copy and paste the contents of your public key (the
The next stage is to name your cluster and provide the sizing details of both the management instances and worker instances for your cluster. The virtual IP address is the main IP address of the API server that provides the load balancing service — aka the ingress server.
For the next stage you can provide some optional metadata or labels to make it easier to identify your VMs.
The next stage is the define the resource location. This is where your Kubernetes cluster will reside and the data-store used by the virtual machines. We’re using the root VM folder, our vSAN datastore and lastly, we’ve created a separate resource pool called k8s-prod to manage the clusters CPU, storage and memory limits.
With the networking configuration, you can use the defaults provided here. However, we’ve created a separate Distributed switch called VM Tanzu Prod which its connected via its own segregated VLAN back into our network.
The last and final stage is to again select the Proton Kube OVA which we downloaded earlier as the base image for the workers and management virtual machines. If nothing is listed here, make sure you have imported the OVA and converted it from a VM into an OVA template. Use the refresh icon to reload the list without starting over.
Finally, review your configuration and click “Deploy management cluster”. This can take around 8–10 minutes and even longer depending on your internet connection. As the setup needs to pull down and deploy multiple images for the Docker containers which are used to bootstrap the Tanzu management cluster.
Once the installation has finished you’ll now see several VMs within the vSphere web client named something similar too:
tkg-mgmt-vsphere-20200927183052-control-plane-6rp25 . At this point the Tanzu management plane has now been deployed.
We now need to configure the
kubectl (used to deploy pods and interact with the Kubernetes cluster) command to use our new cluster as our primary context (shown by the asterisk). Grab the cluster credentials with:
tkg get credentials zercurityCredentials of workload cluster 'zercurity' have been saved
You can now access the cluster by running 'kubectl config use-context zercurity-admin@zercurity'
Using the command above,copy and paste it into our
kubectl command, to set your new context. This is useful for switching between multiple clusters:
kubectl config use-context zercurity-admin@zercurity
kubectl connected to our cluster let’s create our first namespace to check everything is working correctly.
kubectl create namespace zercurity
At this stage you’re almost ready to go and you can start deploying non-persistent containers to test out the cluster.
Installing the VMware TKG extensions
VMware provides a number of helpful extensions to add monitoring, logging and ingress services for web based (HTTP/HTTPS) deployments via contour. Note that TCP/IP ingress isn’t supported.
The extensions archive should have been download already from
I’d recommend applying the following extensions. There are a few more contained within the archive. However, I’d argue these are the primary extensions you’re going to want to add.
kubectl apply -f cert-manager/*
kubectl apply -f ingress/contour/*
kubectl apply -f monitoring/grafana/*
kubectl apply -f monitoring/prometheus/*
Configuring vSAN storage
This is the last stage I promise. Also critical if you intend on using persistent disks (persistent volume claims,
pvcs) along side your deployed pods. In this last part I’m also assuming you’re using vSAN as it has native support for container volumes.
In order to let the Kubernetes cluster know to use vSAN as its storage backend we need to create a new
StorageClass. To make this change simple copy and paste the command below:
cat <<EOF | kubectl apply -f -
storagepolicyname: "vSAN Default Storage Policy"
You can then check your
StorageClass has been correctly applied like so:
kubectl get scNAME PROVISIONER RECLAIM BINDINGMODE EXPANSION AGE
thin (default) csi.vsphere.. Delete Immediate true 2skubectl describe sc thin
You can also test your
StorageClass config is working by creating a quick
PersistentVolumeClaim — again, copy and paste the command below.
cat <<EOF | kubectl apply -f -
PersistentVolumeClaim will be created within the default namespace using 1Gi of disk space.
In the event Status shows the
<pending> state for more than 30 seconds then this usually means some sort of issue has occurred. You can use the
kubectl describe sc thin command to get additional information on the state of the
kubectl get pvcNAME STATUS VOLUME CAPACITY ACCESS STORAGECLASS AGE
testing Bound pvc-3974d60f 1Gi RWO default 6s
Upgrading Kubernetes on TKG
This is surprisingly easy using the
tkg command. All you need to do is get the management cluster id using the
tkg get management-cluster command. Then run
tkg upgrade management-cluster with your management cluster id. This will automatically update the Kubernetes control plane and worker nodes.
tkg upgrade management-cluster tkg-mgmt-vsphere-20200927183052Upgrading management cluster 'tkg-mgmt-vsphere-20200927183052' to TKG version 'v1.2.0' with Kubernetes version 'v1.19.1+vmware.2'. Are you sure? [y/N]: y
Upgrading management cluster providers...
Checking cert-manager version...
Deleting cert-manager Version="v0.11.0"
Installing cert-manager Version="v0.16.1"
Waiting for cert-manager to be available...
Deleting Provider="cluster-api" Version="" TargetNamespace="capi-system"
Installing Provider="cluster-api" Version="v0.3.10" TargetNamespace="capi-system"
Deleting Provider="bootstrap-kubeadm" Version="" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.10" TargetNamespace="capi-kubeadm-bootstrap-system"
Deleting Provider="control-plane-kubeadm" Version="" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.10" TargetNamespace="capi-kubeadm-control-plane-system"
Deleting Provider="infrastructure-vsphere" Version="" TargetNamespace="capv-system"
Installing Provider="infrastructure-vsphere" Version="v0.7.1" TargetNamespace="capv-system"
Management cluster providers upgraded successfully...
Upgrading management cluster kubernetes version...
Verifying kubernetes version...
Retrieving configuration for upgrade cluster...
Create InfrastructureTemplate for upgrade...
Upgrading control plane nodes...
Patching KubeadmControlPlane with the kubernetes version v1.19.1+vmware.2...
Waiting for kubernetes version to be updated for control plane nodes
Upgrading worker nodes...
Patching MachineDeployment with the kubernetes version v1.19.1+vmware.2...
Waiting for kubernetes version to be updated for worker nodes...
updating 'metadata/tkg' add-on...
Management cluster 'tkg-mgmt-vsphere-20200927183052' successfully upgraded to TKG version 'v1.2.0' with kubernetes version 'v1.19.1+vmware.2'
Congratulations, you’ve now got a Kubernetes cluster up and running on top of your VMware cluster. In the next post we’ll be looking at deploying PostgreSQL into our cluster ready for our instance of Zercurity.
If you’ve got stuck or have a few suggestions for us to add don’t hesitate to get in touch via our website or leave a comment below.