"Cloud-like" Infrastructure at Home - Part 1: LoadBalancers on the Metal

By Calum MacRae

August 10, 2020

This is the introductory post to a series outlining how I achieve "Cloud-like" application deployments on my personal servers. If you're interested in Platform Engineering/SRE/DevOps, read on!

What does "Cloud-like" mean?

I'm using this phrase to express a set of capabilities that engineers have come to expect from cloud providers in their offerings for hosting infrastructure. Namely: load balancers, dynamic DNS, certificate leasing, and perhaps a few others.

Who's this post for?

I'm guessing by now you have at least some interest in the subject, so I'll state some assumptions:

You're somewhat familiar with Kubernetes & its configuration with YAML
You're somewhat familiar with basic networking concepts, like DHCP, DNS, TCP/IP, ARP
You're not looking for a guide to set-up self-hosted Kubernetes (I'll write about how I do this in a future post) - we're going to be working with an already running cluster

All set then? Let's dive in!

What're we solving today?

LAN routable traffic to services hosted in Kubernetes

Say we have a web service we want to deploy - let's call it "coffee"

We deploy it into k8s (Kubernetes) and can access it by using

$ kubectl port-forward coffee-5b8f7c69bd-9x6sk 8080:80

Now we can visit localhost:8080/ to reach our service - great!

But we don't want to have to rely on kubectl to handle the proxying for us. What about other devices on the network that we want to be able to access the coffee service?

Well, once we've gotten through this post, we'll end up getting a LAN accessible web service we can reach via a segment of our LAN address space - like any other physical device on the network - just by deploying k8s manifests.

No need to pick out an IP address and set up static routes, no need for any iptables magic.

Accessing Kubernetes services via IP

When deploying a pod of containers providing a network application, we can expose its port so it's accessible using a Service. Here's what a Service accompanying a Deployment for coffee might look like:

---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: coffee
spec:
  replicas: 1
  selector:
    matchLabels:
      app: coffee
  template:
    metadata:
      labels:
        app: coffee
    spec:
      containers:
      - name: coffee
        image: example/coffee
        ports:
        - name: api
          containerPort: 80

---
kind: Service
apiVersion: v1
metadata:
  name: coffee
spec:
  selector:
    app: coffee
  ports:
  - name: api
    port: 8080
    targetPort: api

After evaluating this manifest, our k8s cluster would have a coffee service that's available within the cluster network. Other pods within the same namespace could simply call out to http://coffee:8080, or pods deployed in other namespaces could reach it via http://coffee.example:8080 (example being the namespace we deployed the coffee resources to).

By default, we'll get a ClusterIP type Service. I won't go into much detail here about k8s Service objects, but it's useful to outline the available ServiceTypes, in particular "publishing" services.

Publishing ServiceTypes

Let's take a look at each type, leaning on excerpts from the official documentation.

ClusterIP: Exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster. This is the default ServiceType.

We can quite plainly see this is a no-go for what we want, we want our service to be reachable outside the cluster.

NodePort: Exposes the Service on each Node’s IP at a static port (the NodePort). A ClusterIP Service, to which the NodePort Service routes, is automatically created. You’ll be able to contact the NodePort Service, from outside the cluster, by requesting <NodeIP>:<NodePort>.

This looks like it meets our needs in terms of external (to the cluster, not our LAN) access. But on second thought, it begs the question "how do we know which node, and thus which NodeIP we'll be using to reach the service?" We don't want to have to maintain node tains/tolerations to schedule on specific nodes to achieve this.

No dice.

ExternalName: Maps the Service to the contents of the externalName field (e.g. foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set up.

Hmm, this doesn't seem like a good fit either. There's no mention of traffic being routable for cluster external networking. It's simply about mapping external DNS records to services within the cluster, for other cluster residents.

Next!

LoadBalancer: Exposes the Service externally using a cloud provider’s load balancer. NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created.

This sounds perfect! Except… we're not deploying to some cloud environment. We don't have a cloud provider's solution to provision load balancers.

…Or do we?

LoadBalancers on the metal

Turns out a service of type LoadBalancer is actually achievable, without the cloud provider!

MetalLB is a project that aims to bring the LoadBalancer ServiceType to k8s clusters provisioned on bare metal - and does so very well.

Deployment & configuration

MetalLB has two modes of operation: BGP & Layer2. Both are self explanatory, if you're familiar with the respective network protocols - I won't be expanding on either, as it's out of scope for this write-up.

For our implementation, we're going to keep things simple and go with Layer2.

MetalLB is deployed entirely with native k8s manifests. How you choose to deploy your YAML is up to you. There are a few options:

Raw YAML manifests
kustomize
Helm

I personally use ArgoCD to deploy the Helm chart, but details on that are for another post.

The installation documentation is straight forward and can be found here.

Let's focus on the configuration. MetalLB evaluates its configuration through a ConfigMap. For the Layer2 configuration, it's as simple as picking out an IP range you want your services to be allocated an address from.

My LAN is 10.0.0.0/16, I opted to slice out 10.0.42.0/24. So my configuration looks like:

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
      - name: default
        protocol: layer2
        addresses:
          - 10.0.42.0/24

If we configure MetalLB with this, we can then deploy Service objects with spec.type: LoadBalancer in their manifest and expect to get an IP leased from the 10.0.42.0/24 pool.

Take it for a spin

Let's update our coffee service manifest to set spec.type to LoadBalancer

---
kind: Service
apiVersion: v1
metadata:
  name: coffee
spec:
  selector:
    app: coffee
  ports:
    - name: coffee
      port: 8080
      targetPort: coffee
  type: LoadBalancer # <- here

Applying this will yield something like

$ kubectl describe svc coffee
Name:                     coffee
Namespace:                default
Labels:                   <none>
Annotations:              Selector:  app=coffee
Type:                     LoadBalancer
IP:                       172.16.57.10
LoadBalancer Ingress:     10.0.42.0
Port:                     api  8080/TCP
TargetPort:               coffee/TCP
NodePort:                 api  30786/TCP
Endpoints:                192.168.2.7:8080
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason       Age    From                Message
  ----    ------       ----   ----                -------
  Normal  IPAllocated  9s  metallb-controller  Assigned IP "10.0.42.0"

For those of us who aren't networking wizards, something may look a bit strange here: 10.0.42.0. Rest assured, that's actually a routable address. Most day-to-day DHCP configurations will start the allocation range at X.X.X.1, but there's no need here.

So, if our service is alive, we should be able to establish a TCP session with it. Let's try with good old netcat

$ nc -vz 10.0.42.0 8080
Connection to 10.0.42.0 port 8080 [tcp/http] succeeded!

Woo! If we look a little closer with a simple ping, we'll see something interesting:

$ ping 10.0.42.0
PING 10.0.42.0 (10.0.42.0): 56 data bytes
Request timeout for icmp_seq 0
92 bytes from 10.0.10.2: Redirect Host(New addr: 10.0.42.0)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 903f   0 0000  3f  01 ac67 10.0.1.3  10.0.42.0

It's not that it didn't respond to the ICMP request, it's that we got

92 bytes from 10.0.10.2: Redirect Host(New addr: 10.0.42.0)

That address, 10.0.10.2, is the IP for compute2, an active compute node in my cluster. Let's take a look at where coffee's pod was scheduled

NAME                      READY   STATUS    RESTARTS   AGE   IP            NODE
coffee-65b9b69679-96kl8   1/1     Running   0          30m   192.168.2.7   compute2.cmacr.ae

There it is, on compute2. Since this is a layer 2 let's see what those two addresses look like in the ARP table

$ arp -a
? (10.0.10.2) at b8:ae:ed:7d:19:6 on en0 ifscope [ethernet]
? (10.0.42.0) at b8:ae:ed:7d:19:6 on en0 ifscope [ethernet]

And there you have it: the same MAC address. Hopefully by stepping through the flow of traffic so far, it becomes a little clearer how layer 2 mode in MetalLB works.

Our cluster is now set up to receive external traffic, from the rest of our LAN. Perfect! Though, those IPs that MetalLB is leasing are dynamic. We don't want to have to keep track of which service has which IP by asking k8s…

Watch out for 'Part 2: Hosting your own dynamic DNS solution'

Next time I'll detail how I simplify reaching these services with human friendly DNS records

Thanks for reading!

Posted on:: August 10, 2020

Length:: 7 minute read, 1451 words

Tags:: kubernetes linux networking devops

See Also:: A Simple GitHub Actions Pipeline for Releasing Container Images; kove; Nix Generator Functions for Sway