Deploying Kubernetes (k8s) on vSphere 7 with Tanzu Kubernetes Grid (TKG)

Zercurity
10 min readNov 23, 2020

--

For a change of gear away from our usual Osquery posts. Over the last year we’ve done a number of Zercurity deployments onto Kubernetes. With the most common being done on-prem with VMware’s vSphere. Fortunately, as of the most recent release of VMware’s vCenter you can easily deploy Kubernetes with VMware’s Tanzu Kubernetes Grid (TKG).

This post will form part of a series of posts on running Zercurity on top of Kubernetes in a production environment.

As with all things, there a number of ways to deploy and manage Kubernetes on VMware. If you’re running the latest release of vCenter (7.0.1.00100) you can actually deploy a TKG cluster straight from the Workload Management screen. Right from the main dashboard which has a full guide to walk you through the setup process. This also works along side NSX-T Data-center edition for additional management functionality and networking.

TKG Deployment setup on vCenter 7.0.1.00100 via Workload Management

However, for the purposes of this post and to support older versions of ESX (vSphere 6.7u3 and vSphere 7.0) and vCenter we’re going to be using the TKG client utility which spins up its own simple to use web UI anyway for deploying Kubernetes.

Installing Tanzu Kubernetes Grid (TKG)

Right, first things first. Visit the TKG download page. Its important that you download the following packages appropriate for your client platform (we’ll be using Linux):

Note: You will require a VMware account to download these files.

  • VMware Tanzu Kubernetes Grid 1.2.0 CLI
    tkg is used to install, manage and upgrade the Kubernetes cluster running on top vCenter.
  • VMware Tanzu Kubernetes Grid 1.2.0 OVAs for Kubernetes
    In order to deploy the TKG. The setup requires the Photon container image photon-3-v1.17.3_vmware.2.ova. Used for both the workers and management VMs.
  • Kubectl 1.19.1 for VMware Tanzu Kubernetes Grid 1.2.0
    kubectl is a command line tool used to administer your Kubernetes cluster from the command line.
  • VMware Tanzu Kubernetes Grid Extensions Manifest 1.2.0
    Additional services, configuration and RBAC changes that are applied to your cluster post installation.

Prerequisites

The following setups are using Ubuntu Linux. Notes will be added for additional platforms.

Installing kubectl

With the “VMware Tanzu Kubernetes Grid 1.2.0 CLI” archive downloaded. Simply extract the archive and install the tkg binary into your system or user PATH. If you’re using Mac OSX you can use the same command below just substitute darwin for linux.

wget https://download2.vmware.com/software/TKG/1.2.0/kubectl-linux-v1.19.1-vmware.2.gz
gunzip kubectl-linux-v1.19.1-vmware.2.gz
sudo mv kubectl-mac-v1.19.1-vmware.2 /usr/local/bin/kubectl
sudo chmod +x /usr/local/bin/kubectl

Installing docker

Docker is required as the TKG installer spins up several docker containers used to connect to and configure the remote vCenter server and its subsequent VMs.

If you’re running Mac OSX you can make use of the Docker Desktop app.

sudo apt-get update
sudo apt-get install apt-transport-https \
ca-certificates curl gnupg-agent software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

Lastly, you’ll need to give your current user permission to interact with the docker daemon.

sudo usermod -aG docker <your-user>

Installing Tanzu (tkg)

The Tanzu tkg is a binary application used to install, upgrade and manage your Kubernetes cluster on top of VMware vSphere.

wget https://download2.vmware.com/software/TKG/1.2.0/tkg-darwin-amd64-v1.2.0-vmware.1.tar.gz
tar -zxvf tkg-linux-amd64-v1.2.0-vmware.1.tar.gz
cd tkg
sudo mv tkg-linux-amd64-v1.2.0+vmware.1 /usr/local/bin/tkg
sudo chmod +x /usr/local/bin/tkg

Once installed you can run tkg version to check tkg is working and installed into your system PATH.

Importing the OVA images

From the hosts and clusters view, if you right click on your Data-center you’ll see the option “Deploy OVF template” select the OVA downloaded from the vmware TKG downloads page. We’re using photon-3-v1.17.3_vmware.2.ova . Then simply follow the on screen steps.

Once the OVA has been imported its deployed as a VM. Do not power on the VM. The last step is to convert it back into a template so it can be used by the TKG installer.

Right click on the imported VM photon-3-kube-v1.19.1+vmware.2a , select the “Template” menu item and choose “Convert to template”. This will take a few moments and the Template will now be visible under the “VMS and Templates” view.

Optional prerequisites

You many also choose to configure a dedicated network and or resource pool for your k8s cluster. This can be done directly from the vSphere web UI. If you’re configuring a new network please ensure nodes deployed to that network will receive an IP address via DHCP and connect to the internet.

Installing Tanzu Kubernetes Grid

Once all the prerequisites are met launch the tkg web installer:

tkg init --ui

Once you run the command your browser should automatically open and point your browser to: http://127.0.0.1:8080/

The Tanzu Kubernetes Grid installation screen.

Choose the “VMware vSphere” deploy option. The next series of steps will help configure the TKG deployment.

The first step is to connect to your vSphere vCenter instance with your administrator credentials. Upon clicking “connect” you’ll see your available data-centers show up. TKG also requires your RSA public key, in order for management cluster to use for management.

TKG installation setup.

Your SSH RSA key is usually located within your home directory:

cat .ssh/id_rsa.pub

If the file doesn't exist or you need to create a new RSA key you can generate one like so:

ssh-keygen -t rsa -C "your@email.com"

If you change the default filename you’ll see two files created, once the command has run. You need to copy and paste the contents of your public key (the .pub file).

The next stage is to name your cluster and provide the sizing details of both the management instances and worker instances for your cluster. The virtual IP address is the main IP address of the API server that provides the load balancing service — aka the ingress server.

Choosing the cluster instance types for vSphere Kubernetes

For the next stage you can provide some optional metadata or labels to make it easier to identify your VMs.

The next stage is the define the resource location. This is where your Kubernetes cluster will reside and the data-store used by the virtual machines. We’re using the root VM folder, our vSAN datastore and lastly, we’ve created a separate resource pool called k8s-prod to manage the clusters CPU, storage and memory limits.

Managing the resources for your Kubernetes cluster.

With the networking configuration, you can use the defaults provided here. However, we’ve created a separate Distributed switch called VM Tanzu Prod which its connected via its own segregated VLAN back into our network.

Defining the Tazu network resources.

The last and final stage is to again select the Proton Kube OVA which we downloaded earlier as the base image for the workers and management virtual machines. If nothing is listed here, make sure you have imported the OVA and converted it from a VM into an OVA template. Use the refresh icon to reload the list without starting over.

Choosing the photon OVA for deployment.

Finally, review your configuration and click “Deploy management cluster”. This can take around 8–10 minutes and even longer depending on your internet connection. As the setup needs to pull down and deploy multiple images for the Docker containers which are used to bootstrap the Tanzu management cluster.

Configuration review your the Tanzu Kubernetes cluster

Once the installation has finished you’ll now see several VMs within the vSphere web client named something similar too: tkg-mgmt-vsphere-20200927183052-control-plane-6rp25 . At this point the Tanzu management plane has now been deployed.

Success. \o/

Our deploy Tanzu cluster

Configuring kubectl

We now need to configure the kubectl (used to deploy pods and interact with the Kubernetes cluster) command to use our new cluster as our primary context (shown by the asterisk). Grab the cluster credentials with:

tkg get credentials zercurityCredentials of workload cluster 'zercurity' have been saved
You can now access the cluster by running 'kubectl config use-context zercurity-admin@zercurity'

Using the command above,copy and paste it into our kubectl command, to set your new context. This is useful for switching between multiple clusters:

kubectl config use-context zercurity-admin@zercurity

With kubectl connected to our cluster let’s create our first namespace to check everything is working correctly.

kubectl version
kubectl create namespace zercurity

At this stage you’re almost ready to go and you can start deploying non-persistent containers to test out the cluster.

Installing the VMware TKG extensions

VMware provides a number of helpful extensions to add monitoring, logging and ingress services for web based (HTTP/HTTPS) deployments via contour. Note that TCP/IP ingress isn’t supported.

The extensions archive should have been download already from www.vmware.com/go/get-tkg .

wget https://download2.vmware.com/software/TKG/1.2.0/tkg-extensions-manifests-v1.2.0-vmware.1.tar
gunzip tkg-extensions-manifests-v1.2.0-vmware.1.tar-2.gz
cd tkg-extensions-v1.2.0+vmware.1/

I’d recommend applying the following extensions. There are a few more contained within the archive. However, I’d argue these are the primary extensions you’re going to want to add.

kubectl apply -f cert-manager/*
kubectl apply -f ingress/contour/*
kubectl apply -f monitoring/grafana/*
kubectl apply -f monitoring/prometheus/*

Configuring vSAN storage

This is the last stage I promise. Also critical if you intend on using persistent disks (persistent volume claims, pvcs) along side your deployed pods. In this last part I’m also assuming you’re using vSAN as it has native support for container volumes.

Tanzu VMware vSAN persistent volume claims.

In order to let the Kubernetes cluster know to use vSAN as its storage backend we need to create a new StorageClass. To make this change simple copy and paste the command below:

cat <<EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: thin
annotations:
storageclass.kubernetes.io/is-default-class: true
provisioner: csi.vsphere.vmware.com
allowVolumeExpansion: true
parameters:
storagepolicyname: "vSAN Default Storage Policy"
EOF

You can then check your StorageClass has been correctly applied like so:

kubectl get scNAME            PROVISIONER   RECLAIM  BINDINGMODE  EXPANSION   AGE
thin (default) csi.vsphere.. Delete Immediate true 2s
kubectl describe sc thin

You can also test your StorageClass config is working by creating a quick PersistentVolumeClaim — again, copy and paste the command below.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: testing
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
EOF

This PersistentVolumeClaim will be created within the default namespace using 1Gi of disk space.

In the event Status shows the <pending> state for more than 30 seconds then this usually means some sort of issue has occurred. You can use the kubectl describe sc thin command to get additional information on the state of the StorageClass .

kubectl get pvcNAME     STATUS  VOLUME         CAPACITY  ACCESS  STORAGECLASS  AGE
testing Bound pvc-3974d60f 1Gi RWO default 6s

Upgrading Kubernetes on TKG

This is surprisingly easy using the tkg command. All you need to do is get the management cluster id using the tkg get management-cluster command. Then run tkg upgrade management-cluster with your management cluster id. This will automatically update the Kubernetes control plane and worker nodes.

tkg upgrade management-cluster tkg-mgmt-vsphere-20200927183052Upgrading management cluster 'tkg-mgmt-vsphere-20200927183052' to TKG version 'v1.2.0' with Kubernetes version 'v1.19.1+vmware.2'. Are you sure? [y/N]: y
Upgrading management cluster providers...
Checking cert-manager version...
Deleting cert-manager Version="v0.11.0"
Installing cert-manager Version="v0.16.1"
Waiting for cert-manager to be available...
Performing upgrade...
Deleting Provider="cluster-api" Version="" TargetNamespace="capi-system"
Installing Provider="cluster-api" Version="v0.3.10" TargetNamespace="capi-system"
Deleting Provider="bootstrap-kubeadm" Version="" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.10" TargetNamespace="capi-kubeadm-bootstrap-system"
Deleting Provider="control-plane-kubeadm" Version="" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.10" TargetNamespace="capi-kubeadm-control-plane-system"
Deleting Provider="infrastructure-vsphere" Version="" TargetNamespace="capv-system"
Installing Provider="infrastructure-vsphere" Version="v0.7.1" TargetNamespace="capv-system"
Management cluster providers upgraded successfully...
Upgrading management cluster kubernetes version...
Verifying kubernetes version...
Retrieving configuration for upgrade cluster...
Create InfrastructureTemplate for upgrade...
Upgrading control plane nodes...
Patching KubeadmControlPlane with the kubernetes version v1.19.1+vmware.2...
Waiting for kubernetes version to be updated for control plane nodes
Upgrading worker nodes...
Patching MachineDeployment with the kubernetes version v1.19.1+vmware.2...
Waiting for kubernetes version to be updated for worker nodes...
updating 'metadata/tkg' add-on...
Management cluster 'tkg-mgmt-vsphere-20200927183052' successfully upgraded to TKG version 'v1.2.0' with kubernetes version 'v1.19.1+vmware.2'

All finished

Congratulations, you’ve now got a Kubernetes cluster up and running on top of your VMware cluster. In the next post we’ll be looking at deploying PostgreSQL into our cluster ready for our instance of Zercurity.

If you’ve got stuck or have a few suggestions for us to add don’t hesitate to get in touch via our website or leave a comment below.

--

--