Create standalone Kubernetes cluster with Vagrant

Minikube is good for development, but you might want to test software on a production-like cluster which you can still run on your laptop. For instance, you might want to test some distributed software on Kubernetes. In that case, I find very useful running a cluster with multiple workers using virtual machines.

Before writing this article, I looked for a solution on the Internet. I found a lot of similar projects, but most of them are either old, or they do more than I need, or they are missing something I want. Eventually, I decided to create my own automation project, borrowing some good ideas from others. I am very grateful to those people who shared their code on GitHub, and I hope that my work will help many other people.

TL;DR

Do you want to use Vagrant and Ansible to provision a Kubernetes cluster for development?

See repository https://github.com/nextbreakpoint/kubernetes-playground

Provision a Kubernetes cluster

I am going to show you how to use Vagrant and Ansible to provision a Kubernetes cluster for development. I'll explain the essential steps for getting a working Kubernetes environment, but you can find the complete code in the GitHub repository, which includes some additional components.

Where to start?

Download and install Vagrant. I am using version 2.2.2.

Download and install VirtualBox (with extension pack). I am using version 6.0.0.

I work on Mac, but the process should be the same on Linux. Not sure about Windows.

Create the machines

In order to create a virtual machine with Vagrant, we need a Vagrantfile. We can create a minimal Vagrantfile with the command:

vagrant init

A Vagrantfile for provisioning a machine with operating system Ubuntu/Xenial, hostname 'server', fixed ip address, 2 CPUs and 2Gb of memory looks like:

Vagrant.configure(2) do |config|
  config.vm.define "server" do |s|
    s.vm.box = "ubuntu/xenial64"
    s.vm.hostname = "server"
    s.vm.network "private_network",
      ip: "192.168.1.10",
      netmask: "255.255.255.0",
      auto_config: true
    s.vm.provider "virtualbox" do |v|
      v.name = "server"
      v.cpus = 2
      v.memory = 2048
      v.gui = false
    end
  end
end

The script is written in Ruby, so we might want to look at Ruby's documentation if needed. I don't know Ruby at all, but I am still able to understand the Vagrantfile.

A Vagrantfile for provisioning multiple machines with different hostnames and ip addresses looks like:

Vagrant.configure(2) do |config|
  (1..3).each do |i|
    config.vm.define "k8s#{i}" do |s|
      s.vm.box = "ubuntu/xenial64"
      s.vm.hostname = "k8s#{i}"
      s.vm.network "private_network",
        ip: "192.168.1.#{i+9}",
        netmask: "255.255.255.0",
        auto_config: true
      s.vm.provider "virtualbox" do |v|
        v.name = "k8s#{i}"
        v.cpus = 2
        v.memory = 2048
        v.gui = false
      end
    end
  end
end

The script contains a loop and some expressions for interpolating the name and the ip address. Honestly, I had to google to find out how to do that.

We are now able to create multiple machines, but we are not provisioning the machines with any software or configuration yet.

Provision the machines

A Vagrantfile for provisioning the machines with a shell script looks like:

$bootstrap_ansible = <<-SHELL
echo "Installing Ansible..."
sudo apt-get update -y
sudo apt-get install -y software-properties-common
sudo apt-add-repository ppa:ansible/ansible
sudo apt-get update -y
sudo apt-get install -y ansible apt-transport-https
SHELL
Vagrant.configure(2) do |config|
  (1..3).each do |i|
    config.vm.define "k8s#{i}" do |s|
      s.vm.box = "ubuntu/xenial64"
      s.vm.hostname = "k8s#{i}"
      s.vm.network "private_network",
        ip: "192.168.1.#{i+9}",
        netmask: "255.255.255.0",
        auto_config: true
      s.vm.provider "virtualbox" do |v|
        v.name = "k8s#{i}"
        v.cpus = 2
        v.memory = 2048
        v.gui = false
      end
      s.vm.provision :shell,
        inline: $bootstrap_ansible
    end
  end
end

The script is defined as variable and executed as inline provisioning script.

I have intentionally chosen to install Ansible for this example, because that is the tool I recommend to use. I could have used a bash script as well, but the result would have been less readable and less maintainable.

Given the Vagrantfile above, we can invoke Ansible adding an inline script like this:

s.vm.provision :shell,
  inline: "PYTHONUNBUFFERED=1 ansible-playbook /vagrant/ansible/playbook.yml -c local"

The playbook file must be created in a directory called 'ansible' at the same location as the Vagrantfile. The same directory will be automatically mounted on the virtual machine under the path /vagrant.

Since we want to provision a Kubernetes cluster with one master node and some worker nodes, we need to create two playbooks, one for the master node, and one for the worker nodes. We can easily select the playbook for each node just like this:

if i == 1
  s.vm.provision :shell,
    inline: "PYTHONUNBUFFERED=1 ansible-playbook /vagrant/ansible/k8s-master.yml -c local"
else
  s.vm.provision :shell,
    inline: "PYTHONUNBUFFERED=1 ansible-playbook /vagrant/ansible/k8s-worker.yml -c local"
end

Let's assume for a moment that we have already created the playbooks (we will come back to this shortly). We have created a Vagrantfile and some provisioning scripts. But how do we create the machines?

We just have to run one Vagrant command to launch and provision the machines using the Ansible playbooks:

vagrant up

We can also boot each machine separately. For instance we might want to run only the master node:

vagrant up k8s1

We can check the status of one machine or multiple machines:

vagrant status k8s1
vagrant status k8s1 k8s2 k8s3

We can stop (without destroying) one machine or all machines:

vagrant halt k8s1
vagrant halt

Finally, we can destroy one machine or all machines:

vagrant destroy -f k8s1
vagrant destroy -f

For a complete list of the commands, see the documentation of Vagrant.

Now that we know how to use Vagrant, we can dive into the playbooks for creating the Kubernetes cluster.

Create the playbooks

The playbook for the master node looks like:

- hosts: localhost
  remote_user: vagrant
  serial: 1
  roles:
    - k8s-base
    - k8s-master

The playbook for the worker node looks like:

- hosts: localhost
  remote_user: vagrant
  serial: 1
  roles:
    - k8s-base
    - k8s-worker

But what are those roles referenced in the playbook?

The roles represent subdirectories where Ansible will look for scripts:

ansible/roles/k8s-base
ansible/roles/k8s-master
ansible/roles/k8s-worker

You can guess that the directory k8s-base will contain everything is common between the master node and the worker nodes. Instead the directory k8s-master will contain only scripts for the master node, and the directory k8s-worker will contain only scripts for the worker nodes.

The roles are applied sequentially and the scripts are executed as user vagrant (which is the standard user of a Vagrant machine). Note that the host is localhost because the playbook is executed locally on the Vagrant machine.

Each role directory should contain a tasks directory and a files directory, like this:

ansible/roles/k8s-base/tasks
ansible/roles/k8s-base/files

The tasks directory should contain a file with the commands we want to execute for that role. The files directory should contain the files which we want to copy on the machine for that role.

Let's create a file main.yml in ansible/roles/k8s-base/tasks which contains:

---
- name: Remove Default Host Entry
  become: yes
  lineinfile:
    dest: /etc/hosts
    regexp: '^127\.0\.1\.1\s+k8s.*$'
    state: absent

- name: Ensure Hosts File
  become: yes
  lineinfile:
    dest: /etc/hosts
    line: " "
  with_items:
    - { ip: "192.168.1.10", name: "k8s1" }
    - { ip: "192.168.1.20", name: "k8s2" }
    - { ip: "192.168.1.30", name: "k8s3" }

When we run the playbook, Ansible will execute the commands for the role k8s-base, including the script above. The script will modify the file /etc/hosts removing the default hostname mapping and adding the hostname of the 3 Vagrant machines. Since the playbook runs as user vagrant, we must become root in order to modify the file /etc/hosts.

For a complete list of the commands, see the documentation of Ansible.

Now that we have all the tools and we understand how to create Vagrant machines and how to run an Ansible playbook, let's see how we can create the Kubernetes cluster.

Base role

We start ensuring that the hostname is not mapped already:

- name: Remove Default Host Entry
  become: yes
  lineinfile:
    dest: /etc/hosts
    regexp: '^127\.0\.1\.1\s+k8s.*$'
    state: absent

Then we must define the hostnames of all nodes using the private ip address of the Vagrant machines:

- name: Ensure Hosts File
  become: yes
  lineinfile:
    dest: /etc/hosts
    line: " "
  with_items:
    - { ip: "192.168.1.10", name: "k8s1" }
    - { ip: "192.168.1.20", name: "k8s2" }
    - { ip: "192.168.1.30", name: "k8s3" }

Then we proceed installing Docker and Kubernetes Apt repositories:

- name: Ensure Docker Apt Key
  become: yes
  apt_key:
    url: https://download.docker.com/linux/ubuntu/gpg
    state: present

- name: Ensure Docker Repository
  become: yes
  apt_repository:
    repo: 'deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable'
    state: present
    update_cache: yes

- name: Ensure Google Cloud Apt Key
  become: yes
  apt_key:
    url: https://packages.cloud.google.com/apt/doc/apt-key.gpg
    state: present

- name: Ensure Kubernetes Repository
  become: yes
  apt_repository:
    repo: 'deb http://apt.kubernetes.io/ kubernetes-xenial main'
    state: present
    update_cache: yes

Then we proceed installing additional software, including docker and kubeadm:

- name: Ensure Base Kubernetes
  become: yes
  apt:
    name: ['curl', 'apt-transport-https', 'docker-ce', 'kubelet', 'kubeadm', 'kubectl', 'kubernetes-cni', 'jq', 'bc', 'gawk']
    state: latest

Then we must ensure that the user vagrant belongs to docker group:

- name: Ensure Docker Group
  group:
    name: docker
    state: present

- name: Ensure User in Docker Group
  user:
    name=vagrant
    groups=docker
    append=yes

Then we must ensure that swap is disable on the machine:

- name: Ensure swap is off
  become: yes
  command: "swapoff -a"

- name: Remove swap from fstab
  become: yes
  lineinfile:
    dest: /etc/fstab
    regexp: 'swap'
    state: absent

Finally, we can copy any file we want on the machine, including utility scripts like this:

- name: Ensure Kubernetes Cleanup
  become:
  copy:
    src: files/clean-k8s
    dest: /usr/local/bin
    mode: 0755
    owner: root
    group: root

The script clean-k8s which resets the state of Kubernetes looks like:

#!/bin/bash
systemctl stop kubelet;
docker rm -f $(docker ps -q); mount | grep "/var/lib/kubelet/*" | awk '{print $3}' | xargs umount 1>/dev/null 2>/dev/null;
rm -rf /var/lib/kubelet /etc/kubernetes /var/lib/etcd /etc/cni /etc/kubernetes;
mkdir -p /etc/kubernetes
ip link set cbr0 down; ip link del cbr0;
ip link set cni0 down; ip link del cni0;
systemctl start kubelet

Master role

We start pulling the Docker images required to setup Kubernetes:

- name: Pull Docker images
  become: yes
  command: "kubeadm config images pull"

Then we initialise the master node using kubeadm:

- name: Ensure kubeadm initialization
  become: yes
  command: "kubeadm init --pod-network-cidr=172.43.0.0/16 --apiserver-advertise-address=192.168.1.10"

Note that we are passing the apiserver address explicitly because kubelet doesn't detect automatically the correct address when running on a Vagrant machine. The pod network cidr can be any private network since we will force that value into the networking plugin later.

After initialising the master node, kubeadm generates a configuration file which must be used to connect to the Kubernetes cluster. We must copy the configuration file on the home directory of user vagrant and load the configuration in the bash profile:

- name: Copy config to home directory
  become: yes
  copy:
    src: /etc/kubernetes/admin.conf
    dest: /home/vagrant/admin.conf
    owner: vagrant
    group: vagrant
    mode: 0600

- name: Update Environment
  become: yes
  lineinfile:
    path: /home/vagrant/.bashrc
    regexp: '^export KUBECONFIG='
    line: 'export KUBECONFIG=/home/vagrant/admin.conf'
    state: present

We can also copy the configuration file on the host, so we will be able to copy the file to other VMs and also talk to Kubernetes from the host:

- name: Copy config to /Vagrant for other VMs
  become: yes
  copy:
    src: /etc/kubernetes/admin.conf
    dest: /vagrant/admin.conf
    owner: vagrant
    group: vagrant
    mode: 0600

The next step is to download the files for configuring a network plugin. There are many plugins available and I tried few of them without success. Fortunately, I found that Calico works fine with a Vagrant machine:

- name: Ensure Calico network file
  become: yes
  get_url:
    url: https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
    dest: /var/tmp/calico.yaml
    mode: 0444

- name: Ensure Calico RBAC file
  become: yes
  get_url:
    url: https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml
    dest: /var/tmp/rbac-kdd.yaml
    mode: 0444

We must modify the default configuration to use the correct CIDR of the pods subnet:

- name: Ensure Calico CIDR
  become: yes
  replace:
    path: /var/tmp/calico.yaml
    regexp: '192.168.0.0\/16'
    replace: '172.43.0.0/16'
    backup: yes

Then we provision the machine with some scripts:

- name: Ensure Utility Scripts
  become: yes
  copy:
    src: files/
    dest: /usr/local/bin/
    owner: root
    group: root
    mode: 0755
  with_items:
    - "start-calico"
    - "taint-nodes"
    - "kubeadm-hash"
    - "kubeadm-token"
    - "create-join-script"

The script start-calico looks like:

#!/bin/bash
kubectl create -f /var/tmp/rbac-kdd.yaml
kubectl create -f /var/tmp/calico.yaml

The script taint-nodes looks like:

#!/bin/bash
kubectl taint nodes --all node-role.kubernetes.io/master-

The script kubeadm-hash looks like:

#!/bin/bash
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'

The script kubeadm-token looks like:

#!/bin/bash
kubeadm token list | grep "authentication" | awk '{ print $1 }'

The script create-join-script looks like:

#!/bin/bash
echo "kubeadm join 192.168.1.10:6443 --token $(kubeadm-token) --discovery-token-ca-cert-hash sha256:$(kubeadm-hash)" > /home/vagrant/join.sh

Finally, we execute the script create-join-script:

- name: Create join script
  become: yes
  command: "create-join-script"

Then we copy the generated script into the host:

- name: Copy join script
  become: yes
  copy:
    src: /home/vagrant/join.sh
    dest: /vagrant/join.sh
    owner: vagrant
    group: vagrant
    mode: 0600

Worker role

We start coping the configuration file on the user vagrant home directory and load the configuration in the bash profile:

- name: Copy config to home directory
  become: yes
  copy:
    src: /vagrant/admin.conf
    dest: /home/vagrant/admin.conf
    owner: vagrant
    group: vagrant
    mode: 0600

- name: Update Environment
  become: yes
  lineinfile:
    path: /home/vagrant/.bashrc
    regexp: '^export KUBECONFIG='
    line: 'export KUBECONFIG=/home/vagrant/admin.conf'
    state: present

Then we copy the join script from the host:

- name: Copy join script
  become: yes
  copy:
    src: /vagrant/join.sh
    dest: /home/vagrant/join.sh
    owner: vagrant
    group: vagrant
    mode: 0700

Finally, we join the cluster using the join script:

- name: Join Kubernetes Cluster
  become: yes
  command: "sh /home/vagrant/join.sh"

We have now created the playbooks for each role and we are ready to create our cluster.

However, there is one step missing which prevent Kubernetes from working properly. As mentioned before, kubelet is not able to detect the correct ip address when running in a Vagrant machine. We have already forced the ip address when we run kubeadm init, but unfortunately that is not enough. We have to modify the ip address on the worker nodes as well.

We can update the kubelet's address in the Vagrantfile:

$restart_kubelet = <<-SHELL
echo "Restarting Kubelet..."
sudo systemctl daemon-reload
sudo systemctl restart kubelet
SHELL
[...]
s.vm.provision :shell,
  inline: "echo 'KUBELET_EXTRA_ARGS=--node-ip=192.168.1.#{i+9}' | sudo tee /etc/default/kubelet"
s.vm.provision :shell,
  inline: $restart_kubelet

Note that we have to force a restart of kubelet after changing the variable KUBELET_EXTRA_ARGS.

Create the cluster

We are ready to create the Kubernetes cluster with the command:

vagrant up

When the cluster is ready, we have to login into the master node:

vagrant ssh k8s1

Then we must start the network plugin with the command:

start-calico

After some time, we can list the nodes with the command:

kubectl get nodes

It should produce an output like this:

NAME   STATUS   ROLES    AGE     VERSION
k8s1   Ready    master   3h20m   v1.13.2
k8s2   Ready    <none>   3h19m   v1.13.2
k8s3   Ready    <none>   3h17m   v1.13.2

We can also list the pods with the command:

kubectl get pods --all-namespaces

It should produce an output like this:

NAMESPACE     NAME                                   READY   STATUS    RESTARTS   AGE
kube-system   calico-node-65f2k                      2/2     Running   0          106m
kube-system   calico-node-c4z8r                      2/2     Running   0          106m
kube-system   calico-node-r54wn                      2/2     Running   0          106m
kube-system   coredns-86c58d9df4-dlcfs               1/1     Running   0          3h20m
kube-system   coredns-86c58d9df4-t5hbr               1/1     Running   0          3h20m
kube-system   etcd-k8s1                              1/1     Running   0          3h20m
kube-system   kube-apiserver-k8s1                    1/1     Running   0          3h20m
kube-system   kube-controller-manager-k8s1           1/1     Running   0          3h20m
kube-system   kube-proxy-5lbkn                       1/1     Running   0          3h17m
kube-system   kube-proxy-67lnb                       1/1     Running   0          3h19m
kube-system   kube-proxy-z8c7m                       1/1     Running   0          3h20m
kube-system   kube-scheduler-k8s1                    1/1     Running   0          3h20m

Schedule workload on Master node

By default, we can't execute workloads on the master node. However we might want to (just to avoid adding an extra virtual machine in case we need three nodes for some workload).

In order to allow workloads on the master node, we must taint the nodes with the command:

taint-nodes

After tainting the nodes, we should be able to schedule pods on the master node.

Connect from Host

We can open a terminal on the host, and we can connect to Kubernetes using the configuration file (admin.conf) we have exported from the Vagrant machine:

kubectl --kubeconfig=$(pwd)/admin.conf get pods --all-namespaces

It should produce the same output as above.

Conclusion

We have completed the configuration of the Kubernetes cluster, and we can start running our workloads.

Since we are running a cluster with multiple workers, we can run distributed systems such as Kafka, Cassandra, and Elasticsearch, and we can simulate situations where one node is down or we loose data in one node. Just keep in mind that the machines have a limited amount of memory, and they share the CPUs of the same host.

This concludes my article, but if you want to see how to install the Kubernetes Dashboard or how to install Helm and Tiller, please have a look at my repository on GitHub:

https://github.com/nextbreakpoint/kubernetes-playground