Skip to content

Kubernetes Short Notes(4)

  • Ops
tags: k8s


Networking for Linux Basics

Network Switch

A switch is a device in a computer network that connects other devices together, can only enable a communication within a network

Host A([eth0] &harr Switch( &harr [eth0]Host B(


A router is a device/service that provides the function of routing IP packets between networks

Switch( <–> []Router[] <–> Switch(


A gateway (in network terms) is a router that describes the function for connectivity

Default Gateway

If none of these forwarding rules in the routing table is appropriate for a given destination address, the default gateway is chosen as the default router of last resort

Forwording packets between interfaces

By default in linux, packets are not forwarded from one interface to the next, for security reasons

Explicity allow it

Persists the settings


Translate host name to IP address by configure the /etc/hosts

When a environment has too many entries and IP address are not persistent, we need a DNS server

The host will lookup an entry in /etc/hosts first, then lookup in the DNS. This order can be changed by configure file /etc/nsswitch.conf

You can configure the DNS server to forward unknown host name to the public name server in the Internet, for example reach

private DNS → Root DNS → .com DNS → google DNS → cache the result

When looking for a host in the same domain, we want to simple use the host name not the full name, such as using web not, therefore we specify the domain name you want to append in /etc/resolv.conf

There are records stores in DNS with specific types:

  • A: ipv4
  • AAAA: ipv6
  • CNAME: name to name mapping

You can use tools like nslookup, dig to debug, note that nslookup only query from dns, not files

There are plenty DNS solutions, such as CoreDNS, except configure from files, CoreDNS supports other ways of configuring DNS entries through plugins like kubernetes

Network Namespace

A namespace is a way of scoping a particular set of identifiers

Linux provides namespaces for networking and processes, if a process is running within a process namespace, it can only see and communicate with other processes in the same namespace

Linux starts up with a default network namespace

Each network namespace has its own routing table and has its own set of iptables

Connect namespaces together using a virtual Ethernet pair (or virtual cable, pipe)

When there more of namespaces need connected, use a virtial switch to create a virtial network. There few solutions:

  • Linux Bridge
  • Open vSwitch


When a private virtual network need to reach the outer network, it need a gateway, the host is the gateway

For destination network to response, enable NAT on host acting as a gateway.

Add a new rule in the NAT IP table in the POSTROUTING chain to masquerade or replace the from address on all packets coming from the source network with its own IP address.

Thus anyone receiving these packets outside the network will think that they are coming from the host and not from within the namespaces

Add a route using default gateway to outside world

For outside world to reach the namespace in a private network, add a port forwarding rule using IP tables to say any traffic coming to port 80 on the localhost is to be forwarded to port 80 on the IP assigned to the namespace

Docker Networking

  • none: disable the networking stack on a container
  • host: remove network isolation between the container and the Docker host, and use the host’s networking directly
  • bridge: connect containers to the same bridge network to communicate

When Docker is installed on the host it creates an internal private network called bridge by default. On the host the network is created by the name docker0.

As mentioned above, the docker0 is the virtual switch for the virtual docker network, it’s done by the same approache:


When we run a container, it create a namespace and connect to the bridge

Port forwarding outside world traffic to container through host

To do that, Docker adds the rule to the docker chain and sets destination to include the containers IP

List the rules that docker create in iptable

List the ports listened on the host

Networking in Kubernetes

Container Network Interface(CNI)

… many container runtimes and orchestrators will seek to solve the same problem of making the network layer pluggable. To avoid duplication, we think it is prudent to define a common interface between the network plugins and container execution …

For container runtime:

  • Container Runtime must create network namespace
  • Identify network the container must attach to
  • Container Runtime to invoke Network Plugin (e.g. bridge) when container is added/deleted
  • JSON format configuration

For plugin:

  • Must support command line arguments ADD/DEL/CHECK
  • Must support parameters container ID, network ns …
  • Must manage IP Address assignment to PODs
  • Must return results in a specific format

CNI support these plugins:

  • VLAN
  • DHCP
  • host-local

Other plugins are support by organizations, all of these container runtimes implement CNI standards:

  • weavework
  • flannel
  • cilium
  • Vmware NSX
  • Calico
  • Infobox

Docker does not implement CNI, Docker has its own set of standards known as Container Network Model(CNM) which is another standard that aims at solving container networking challenges similar to CNI but with some differences. Due to the differences these plugins don’t natively integrate with Docker

You’ll need to work around yourself if you try to run docker container with network plugins which follows CNI, for example create a docker container without any network configuration and then manually invoke the bridge plugin yourself. That is pretty much how kubernetes does it.

Cluster Node Networking

Each node in cluster must have at least one interface and an address configured. The hosts must have a unique hostname set, a unique MAC address.

Some port use by control planes need to be opened:

  • 2379 on master node for ETCD
  • 2380 on multiple master nodes for ETCD client
  • 6443 on master node for kube-api
  • 10250 on master/worker nodes for kubelet
  • 10251 on master node for kube-scheduler
  • 10252 on master node for kube-controller-manager
  • 30000-32767 on worker nodes for NodePort services


Pod Layer Networking

Kubernetes does not come with a built-in solution for POD networking, it expects you to implement a network solution that fit a laid out, clearly requirements for POD networking, such as:

  • every POD should have its own unique IP
  • every POD should be able to reach every other POD within node
  • every POD should be able to reach every other POD across the nodes in the cluster

A networking configuration script should meet CNI’s standard, then execute by kubelet to start a container




./ add

CNI in Kubernetes

The CNI plugin is invoked by the component within Kubernetes that is responsible for creating containers: kubelet

Find the CNI binary and configuration directories

List the CNI plugins

View the configuration file

The isGateway defines whether the bridge network interface should get an IP address assigned so it can act as a gateway.

The ipMasquerade defines if a NAT rule should be added for IP masquerading.

The IPAM section defines IPAM configuration, where you specify the subnet or the range of IP addresses that will be assigned to pods and any necessary routes. The type host-local indicates that the IP addresses are managed locally on this host. Unlike a DHCP server maintaining it remotely. The type can also be set to DHCP to configure an external DHCP server.

Weave Net

When the weave CNI plugin is deployed on a cluster, it deploys an agent or service on each node. They communicate with each other to exchange information regarding the nodes and networks and PODs within them.

Each agent or peer stores a topology of the entire setup, that way they know the pods and their IPs on the other nodes.

Weave creates its own bridge on the nodes and names it weave. Then assigns IP address to each network.

What path a packet takes to reach destination depends on the route configured on the container. Weave makes sure that PODs get the correct route configured to reach the agent. And the agent then takes care of other PODs.

When a packet is sent from one pod to another on another node, weave intercepts the packet and identifies that it’s on a separate network. Weave then encapsulates this packet into a new one with new source and destination and sends it across the network. On the other side, the other weave agent retrieves the packet, decapsulates and routes the packet to the right POD.

The easier way to deploy Weave is deploy it as a DaemonSet in the cluster, the weave POD will runs on all the nodes:

Inspect the configuration file

Inspect the weave bridge interface

Inspect the IP address range the weave network use

IP Address Management(IPAM)

  • DHCP

  • static

  • host-local

    stores the state locally on the host filesystem, therefore ensuring uniqueness of IP addresses on a single host.

  • other, such as weave’s ipm

    by default Weave allocates the IP range for the entire network, that gives the network IP from to to use on PODs, and the weave peers decide to split the IP address equally and assigns one portion to each node

Service Networking

Service is a cluster wide concept, there is no server or service really listening on the IP of the service, there is no processes or namespaces or interfaces for a service, it’s just a virtual object

When we create a service object in kubernetes, it is assigned an IP address from a pre-defined range. The kube-proxy component that running on each node creates forwarding rules for the service IP, forwarding traffics from the service IP to the POD IP

The kube-proxy supports different ways to create these rules (proxy-mode)

  • userspace
  • ipvs
  • iptables (default)

Inspect the service IP range

Inspect the proxy mode

Cluster DNS Resolution

Kubernetes deploys a built-in DNS server by default, you need to do it by yourself if you setup Kubernetes manually. When a service is created, the kubernetes DNS creates a record for the service, it maps the service name to the IP address

For each namespace the DNS server creates a subdomain. All the services are grouped together into another subdomain called svc

All the services and PODs are grouped together into a root domain for the cluster, which is set to cluster.local by default

For example curl an web-service in namespace apps:

Records for PODs are not created by default. But we can enable that explicitly. Once enabled, Records are created for pods as well. It does not use the POD name, it generates a name by replacing the dots in the IP address with dashes

| Hostname | Namespace | Type | Root | IP Address |
| web-service | apps | svc | cluster.local | |
| 10-244-2-5 | apps | pod | cluster.local | |

This is kind of how kubernetes does it


Except it does not create entries for PODs to map pod name to its IP address, it does that for services

Prior to version v1.12 the DNS implemented by kubernetes was known as kube-dns. With Kubernetes version 1.12 the recommended DNS server is CoreDNS

The CoreDNS is deploy as PODs (Deployment with 2 replicas) in a cluster that runs the CoreDNS executable.

This config file is passed to the POD as a ConfigMap object

When we deploy CoreDNS solution, it also creates a service to make it available to other components within a cluster. The service is named as kube-dns by default. The kubelet is responsible for configuring the nameserver to the DNS server IP of the PODs

If you try to manually lookup the service using nslookup or the host command, it will return the FQDN of the service. It look up for the full name because the resolv.conf file also has a search entry which is set to default.svc.cluster.local, svc.cluster.local and cluster.local

Notice that it only has search entries for service. You need to specify the FQDN of the pod


Service node ports can only allocate high numbered ports which are greater than 30000

If using public cloud, we can use service type LoadBalancer to create a proxy server for the service. By doing that, Kubernetes sends a additional request to cloud provider to provision a network load balancer, configured to route traffic to the service ports

When your application scales, you need another proxy to redirect traffic to multiple load balancers, that may increase the complexity when you need to configure the firewall rules and SSL for each applications

That’s where Ingress comes in, Ingress is a layer 7 load balancer built-in to the cluster, it can helps users access applications using a single externally accessible URL that can configured to route to different services within the cluster base on the URL path and implement SSL security as well

There are solutions avaliable for ingress

  • GCE
  • Nginx
  • HAProxy
  • Trafik
  • Istio

GCE and Nginx are currently being supported and maintained by the Kubernetes project. An Ingress Controller is just a web server, which has additional intelligence to monitor the Kubernetes cluster for new definitions or ingress resources

To manually configure a nginx ingress controller, you will need to create these objectives:

  • Deployment: expose container port 80 and 443
  • Service: expose node port 80 and 443
  • ConfigMap: store nginx configuration files
  • ServiceAccount, Role, ClusterRole, RoleBinding: for permissions to re-configure at resource changes

You might want to deploy an additional service use for default backend, which means if traffic does not match any rules, it direct to this service as default

See the full configurations using helm chart nginx-ingress


Persistent Volume

Except storing a volume on the host, Kubernetes provide serveral type of storage solutions

  • NFS
  • GlusterFS
  • Flocker
  • Ceph
  • ScaleIO
  • Azure Disk
  • Google Persistent Disk

Persistent Volume Claim

Administrators creates PV, and users creates PVC to use the PV, during the binding process Kubernetes tries to find the PV that has sufficient capacity as requested by the claim and any other request properties such as access modes, volume modes, storage class and selector

Note that a smaller claim may get bound to a larger volume if all the other criteria matches and there are no better options

There is a one to one relationship bewteen PV and PVC, no other claims can utilize the remaining capacity in the volume

Configure the field persistentVolumeReclaimPolicy to defined what action to perform to PV after a PVC deleted.

  • Retain (default)
  • Delete
  • Recycle


發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *