Skip to content


Kubernetes Short Notes (2)

  • Ops


Manual Scheduling

  • Bind the pod to node by nodeName property, before that, the pod stays in the Pending state

  • Manutal ways to bind:

    • specify the spec.nodeName, not updatable

    • create the Binding object


Use to group and select the objects, for example a ReplicaSet object configs:

  • metadata.labels sets the ReplicaSet itself
  • spec.template.metadata.lables sets the Pod
  • spec.selector.matchLabels defines how ReplicaSet to discover the Pod


Use to record other details for intergration purpose e.g. build info, contact details



Limit pods without tolerations cannot get scheduled to a tainted node

  • Taint the nodes

  • Set the pods’ tolerance, three behavior are avaliable if not tolerant:

    • NoSchedule

  • PreferNoSchedule: not guaranteed
  • NoExecute: new pods=NoSchedule, existed pods=evicted

Note the value in tolerations keys must use double quotes

Node Selector

Limit the pod to get scheduled to one kind of node only

  • Lable the node
  • Set the nodeSelector

Note there is no OR or NOT conditions, use node affinity instead

Node Affinity

Limit the pod to get scheduled to one or more particular nodes

  • Lable the node
  • Set the nodeAffinity
  • operators: In, NotIn, Exists, DoesNotExist, Gt, Lt
  • 3 types

Combines the Taint/Toleration with NodeSelector or NodeAffinity to cover the scenarios



  • The scheduling base on the resource requests
  • By default, k8s assumes a pod requires 0.5 cpu and 256Mi memory


  • By default, k8s limit a pod to 1 cpu and 512Mi memory
  • When a pod try to exceed resources beyond the limit
    • cpu: k8s throttles the cpu won’t kill
    • memory: k8s kill the pod with OOM

Static Pods

Use in creating control plane components (kube admin tools)

Without the intervention from the kube-api server, the kubelet can manage a node independently by monitor config files in the file system, and be able to create, recreate, update and delete the POD only object

  • –pod-manifest-path=/etc/Kubernetes/manifest
  • –config=kubeconfig.yaml (staticPodPath)

While the static pod created, the kube-api only get a readable mirror and not have the ability to update/delete it

Multiple Scheduler

  • copy the kube-scheduler configs from /etc/kubernetes/manifests
  • rename the scheduler --scheduler-name
  • if one master nodes with multiple scheduler:
    • set the --leader-elect=false
  • if multiple masters with multiple scheduler, only one scheduler can active at a time
    • set the --leader-elect=true
    • set the --lock-object-name to differentiate the custom scheduler from default if multiple master
  • specify the scheduler for pod by schedulerName
Read More »Kubernetes Short Notes (2)

Kubernetes Short Notes (1)

  • Ops

Cluster Architecture

Master Node

  • ETCD cluster
  • kube-scheduler
  • kube-controller-manager

These components communicate via kube-api server

Worker Node

  • container runtime engine, e.g. Docker, Rocket, ContainerD
  • kubelet: agent that runs and listen for instructions from kube-api
  • containers

The services deploy within worker nodes communicate with each other via kube-proxy



  • a distributed reliable key-value store
  • client commuications on port 2379
  • server to server on port 2380


  • primary management component

  • setup:

    1. using kube-admin tools

      • deploy kube-api as a pod in kube-system namespace

      • the manifests is at /etc/kubernetes/manifests/kube-apiserver.yaml

      • the options is at /etc/systemd/system/kube-apiserver.service

      • search for kube-apiserver process on master node

  • example: apply deployment using kubectl

    1. authenticates user

  • validate the HTTP requests
  • the kube-scheduler monitored the changes from the kube-api, then:
    • retrieve the node information from kube-api

  • schedule the pod to some node through kube-api to kubelet

  • update the pod info to ETCD
  • kube-controller-manager

    • continuously monitors the state of components
    • the controllers packages into a single process called Kube-Controller-Manager, which includes:
      1. deployment-controller, cronjob, service-account-controller …
      2. namespace-controller, job-contorller, node-controller …
      3. endpoint-controller, replicaset, replication-controller(replica set) …
    • remediate situation


    • decide which pod goes to which node
      1. filter nodes
      2. rank nodes


    • follow the instruction from kube-scheduler to controll the container runtime engine (e.g. docker) that run or remove a container
    • using kube-admin tools to deploy cluster, the kubelet are not installed by default in worker nodes, need intstall manually


    • runs on each nodes in the cluster
    • create iptables rules on each nodes to forward traffic heading to the IP of the services to the IP of the actual pods
    • kube-admin tool deploy kube-proxy as daemonset in each nodes


    • the container are encapsulated into a pod
    • is a single instance of an application, the smallest object in k8s
    • containers in same pod shares storages and network namespaces, created and removed in the same time
    • multi-container pod is rare use case


    • apiVersion support in v1
    • the process to monitor the pods
    • maintain the HA and specified number of pods that running on all nodes
    • only care about the pod which RestartPolicy is set to Always
    • scalable and replacable application should be managed by the controller
    • use cases: rolling updates, multiple release tracks (multiple replication controller replica the same pod but using different labels)


    • next generation of ReplicationController
    • api version support in apps/v1
    • enhance the filtering in .spec.selector (the major difference)
    • be aware of the non-template pod that has same lables
    • using Deployment as a replacement is recommended, it own and manage its ReplicaSets


    • provide replication vis replicaset and other:
      • rolling update
      • rollout
      • pause and resume


    • namespaces created at cluster creation

      1. kube-system

    • kube-public
    • default
    • each namespace can be assigned quota of resources

    • a DNS entry with SERVICE_NAME.NAMESPACE.svc.cluster.local format is automatically created when at service creation

      1. the cluster.local is the default domain name of the cluster

    • permanently config the namespace

    Read More »Kubernetes Short Notes (1)


    • Ops


    compare to memcached

    • support persistant volume
      • RDB
      • AOF
    • support multiple data types
    • pub/sub


    • redis-cli: command line interface
    • redis-sentinel: cluster managing tool
    • redis-server: run server
    • redis-benchmark: stress testing
    • redis-check-aof: check AOF
    • redis-check-dump: check RDB


    Use redis.conf. Docker official redis image not contain this file. Mount it yourself or through redis-server arguments.


    • String: get, set, mget, mset
    • Integer: incr, decr, setbit
    • List: lpush, lrange, lpop
    • Hash Map: hset, hget, hmset, hmget
    • Set: sadd, smember, sdiff, sinter, sunion

    use docker

    Before start

    To connect a container, you need to know the name and the port, in the associated networks to be able to discover the service.

    There is no DNS resolution in docker deault bridge network. In default network, you need to specify --link to connect the containers. The --link is a legacy feature.

    Therefore, create a user-defined network is recommanded, it provide automatic DNS resolution.

    Create a bridge newrok

    Run a redis instance in user-defined network

    Run a redis-cli connect to the redis instance


    all commands are executed as a single isolated operation, serialized and executed sequentially
    atomic: all failed or all succeed

    • MULTI: open a transaction and always return OK
    • EXEC: execute commands in transaction
    • DISCARD: flush commands and exit transaction
    • WATCH: check and set, if watched key changes, not execute


    • before EXEC: e.g. syntax error
    • after EXEC: e.g. value error

    The pipeline discarding the transaction automatically if there was an error during the command queueing

    … To be continued


    • Ops


    Store and retrieve data in memory(not persistent) base on specific hash function.


    • Slab: allocate as many pages as the ones available

    • Page: a memory area of default 1MB which contains as many chunks

    • Chunk: minimum allocated space for a single item

    • LRU: least recently used list

    ref: Journey to the centre of memcached

    we could say that we would run out of memory when all the available pages are allocated to slabs

    memcached is designed to evict old/unused items in order to store new ones

    every item operation (get, set, update or remove) requires the item in question to be locked

    memcached only tries to remove the first 5 items of the LRU — after that it simply gives up and answers with OOM (out of memory)

    Read More »Memcached

    IP Subnetting

    • Ops

    Something you need to know first: Binary Odometer + 1 = + 1 = + 1 =

    in reverse: – 1 =

    Example 1 or with the mask

    Binary Method

    image alt

    Quick Method

    Figure out the subnets:

    1. network and host split in third octect
    2. subtract: 256 – 240 = 16, it means that network are incrementing in values of 16: 0, 16, 32, 48…
    3. 35 in the range of 32 and 48, so is on subnet; next subnet is

    First subnet =

    Next subnet =

    Broadcast address = next subnet – 1

    First host = Subnet + 1

    Last host = Broadcast – 1


    • Class A subnetting ( support 1677214 (2^24) host per network, that way too much
    • Class B subnetting ( support 16382 (2^16) host per network, that way too much
    • Class C subnetting ( support 254 (2^8) host, more likely we subnet down to at least 254 hosts or even further

    If you subnetting a network only has 2 hosts, you can subnet with ( or CIDR as /31

    Network, host number

    • Networks: 2^(network bits)
      • one allocate for the subnet
      • one allocate for the broadcast
    • Hosts: 2^(host bits) – 2

    Subnetting to be short

    1. “stealing” or “taking away” bits from the host portion of an address, and
    2. allocating those bits to network portion

    Example 2

    Origin network need at least 30 subnets as many hosts as possible


    1. draw the line with /18 to split network and host
    2. 2^5 > 30, need 5 subnet bit, draw the line to split subnet and host
    3. network/subnet portion is 8+8+7=23 bits, host portion is 32-23=9 bits
    • First subnet:
    • Second subnet:
    • Last subnet:

    使用 Cloud Build 搭配 Helm 改善雲端部署

    • Ops

    管理 Kubernetes 的服務時常有一些困擾:

    1. 須根據環境 (staging, production…) 去套用不同的設定及環境變數,整合不易
    2. secret 常是手動新增,如 cloudsql-proxy 的憑證,時間久了常忘記該 secret 是幹麻用的,及整個服務重新部署也會卡在這個手動步驟

    使用 Helm 可以幫助我們管理這些部署檔案:

    1. 一鍵部署、移除
    2. 可根據不同的環境採用不同變數,有幾種可行的作法
    3. 可根據彈性的判斷式生成設定檔
    4. chart 的版本控制 (Release)


    1. 有個叢集
    2. Client 端安裝 Helm , Server 端安裝 Tiller
    3. 叢集有 RBAC ,可以關閉 RBAC ,或給 Tiller 權限:
      • helm init預設使用的服務帳戶是default
      • 叢集的default服務帳戶綁定cluster-admin叢集角色

    Helm 有提供 dependency 的功能,可以透過以下指令來部署全部的 subchart:

    但這裡會有一個問題,即部署時沒辦法指定 subchart 要吃哪個 value file ,因若你的 chart 參數是分成,一旦 chart 打包成 package ,package 在使用時只能吃 values.yaml

    下個版本可能提供相對應的作法,請參考相關 issue

    透過 Cloud Build 部署

    透過helm create建立並設定完 chart 後,希望能在 Cloud Build 的流程透過 helm 部署,這邊選用 cloud-builer-community 提供的helm映像檔

    1. Build 該映像檔並推至專案的 Container Registry
    2. 參考 example 來新增流程,如helm installhelm upgrade
    3. 若 RBAC 是啟用的狀態,須要給 Cloud Build 操作叢集的權限
      • Cloud Build 的服務帳戶綁定roles/container.admin角色及cluster-admin叢集角色,請參考相關指令