Devops

Classic Shell Scripting 讀書筆記(一)

  • Devops

入門

printf

  • 格式聲明 (format specfications) 是一種佔位符號 (placeholder),結構包含 1) 百分比符號 (%) 2) 指示符 (specifier),常用的有字符串 %s 及十進位整數 %d

tr

範例:Translate DOS file to UNIX

  • tr -d: 自 stdin 刪除 source-char-list 的字符
  • \r: ASCII carriage return

/dev/tty

範例:Read password via /dev/tty

  • 當程序打開 /dev/tty 時,UNIX 會將它重定向到一個終端再與程序結合,該終端可以是 1)實體的 console 2) 串行端口 (serial port) 3) 偽終端 (pseudoterminal)
  • stty (set tty) 用來控制終端的設置,echo/-echo 選項用來開關自動打印輸入的功能

i18n and l10n

  • 當 i18n 作為設計軟體的過程時,無須再修改軟體或重新編譯程式代碼,就可以給特定的群體使用
  • 當 l10n 作為設計軟體的過程時,目的是讓特定的使用者可以使用軟體。其中包含翻譯輸出的文字、貨幣、日期、時間、單位等格式
  • 對使用者來說,用來控制讓哪種語言或文化環境生效的功能,叫做 locale
  • 除了 CPOSTFIX 以外,locale 名稱並未標準化
  • BSD 與 Mac OS X 完全不支援 locale
  • locale 的支援仍未成熟:Shell 腳本常受到 locale 影響;在大多數 UNIX 系統下,很難從 locale 文件與工具來判定字元集 (character class)、等價字元集 (equivalence class) 實際上包含了哪些字元,以及有哪些排序符號 (collating symbol) 可用。
  • Shell 腳本開發者應了解 locale 對他們代碼所造成的影響

列出所有 locales

取得特定 locale 的資訊

定義 LC_ALL 來覆寫預設的 locale,可查詢的變量有日期時間格式 LC_TIME、貨幣格式LC_MONETARY

Matching text

  • 1992 POSFIX 整合了 grepegrepfgrep,透過不同選項控制
    • 最早的 grep: 使用 BRE (Basic Regular Expression),只能匹配單一正則
    • egrep: 使用 ERE (Extended Regular Expression),只能匹配單一正則
    • fgrep: 不使用正則,使用特定演算法可以同時匹配多個固定字串

Reqular expression

一些重要的 POSIX BRE/ERE 的 meta 字元

Unix 程序使用何種類型

Bracket expression

UNIX 的範圍表達式在 POSIX 的標準下,現在叫做方括號表達式。其中包含三種字元集構造:

  • 字元集 (character class): [:關鍵字:],關鍵字描述不同的字元集,如 alphadigit,例如 [[:alpha:]!] 匹配任一英文字母或驚嘆號
  • 排序符號 (collating symbol): [.排序元素.],受特定 locale 影響,在非英語系的語言,某些成對的字元必須視為單個字元,例如在捷克或西班牙語系,ch 兩個字元會保持連續狀態,在 locale 支持下,可用於如 [ab[.ch.]de] 的匹配
  • 等價字元集 (equivalence class):[=字元=],列出應視為等值的一組字元,如在法文的 locale 裡,[[=e=]] 可能匹配 e、é、ë

BRE

  • 在前面放置 \ 用來轉義 meta 符號,常用的有 \\\[
  • 在方括號表達式中,^ 放在字首是取反 (complement) 的意思
  • 方括號表達式可以表達字元的範圍,如 [0-9a-fA-F],但範圍表示法是根據機器中字元集的數值(ASCII vs EBCDIC),有可移植性的問題。建議用 POSIX 字元集表示法取代。
  • 在方括號表達式中,meta 符號不再具備特殊含意
  • 後向引用 (backreference) 的結構 \( \)\digit 於後方調用,例如匹配所有引號括起來的字 \(["']\).\1
  • 錨點 (anchor) meta 字元 ^$,分別用來針對字串的開始/結尾處進行匹配

範例:預先消除空行

BRE 運算元優先級 (precedence)

由高至低排列

  1. [..][::][==]
  2. \metacharacter
  3. []
  4. \(\)\digit
  5. *\{\}
  6. no symbol,表示連續
  7. ^$

ERE

  • 以匹配單個字串來說,與 BRE 一致,不同之處在於匹配多個字串方面
  • 不存在後向引用
  • ?+ 可以更細膩處理匹配控制
  • 方括號表達式用來「匹配此字串,或其他字串」;交替運算符| 用來「匹配這個序列,或其他序列」
  • () 用來分組
  • ^$ 永遠是 meta 符號,與 BRE 不同,若放在字串中間會匹配不到任何東西

範例:匹配多個連續出現的 read 或是 write,且中間可以是空白

ERE 運算元優先級 (precedence)

  1. [..][::][==]
  2. \metacharacter
  3. []
  4. ()
  5. *+?{}
  6. no symbol
  7. ^$
  8. |

文本替換(substitution) sed

  • / 扮演定界符號(delimiter),分隔正則與替代文本
  • 除了 /,任何可顯示的符號都可以當定界符號,在處理文件名稱時,通常用標點符號 ;:, 取代 /,因為轉義完路徑看起來會很醜
  • 在 s 命令最後,若以 g 結尾,代表全域(global)取代,若以數字結尾,則表示「第 n 個匹配才取代」
  • sed 會記住 -f 腳本裡遇到的最後一個正則,透過使用空的正則,可重複利用同一個正則
  • 命令裡的每個文件會依序打開與讀取,如果沒有文件,則用標準輸入,或用單個破折號 - 表示標準輸入
  • 讀取文件時,將讀取的行放到記憶體某個位置,稱為模式空間(pattern space),將一次次修改的結果覆蓋在相同位置,最後輸出模式空間裡的最後內容

範例:擷取第一個字段(將第一個冒號之後的文本取代為空字串)

範例:將 /home/tolstoy 的目錄結構建立一份副本在 /home/lt 底下,並追蹤執行

  • 使用產生命令(generating commands)的手法,使命令內容成為 shell 的輸入

範例:轉換 HTML 為 XHTML

範例:模擬 grep 的功能,只打印含 的行

addressing

默認情況下,每一個編輯命令(editing command)會套用到每個輸入行;要限制應用到哪些行,須在命令前加個地址(address),以下為不同種類的地址;注意 address 限用單引號

  1. 正則表達式

    範例:為部份代碼增加註釋

  2. 最終行(符號 $ 在 sed 裡表示「最終行」)

    範例:打印文件最終行

  3. 絕對的行號

    範例:模擬 head 功能,q 命令要求 sed 馬上離開,不再讀取其他輸入

  4. 範圍(以逗號隔開兩個絕對行號或正則)

    範例:只打印 10~42 行

    範例:只替代範圍內的行

  5. 否定正則

    注意 ! 後面最好不要加上空格,因為會導致某些老舊的 sed 版本無法識別

    範例: 將沒有 used 的每個行裡的 new 取代為 used

改變定界符

範例:以 : 隔開正則,以 ; 扮演 s 命令的定界符

Longest leftmost 從最長最左邊匹配原則

POSIX 標準指出:「完全一致的匹配,指的是自最左邊開始、針對每個子模式、由左至右,必須匹配到最長的可能的字串」

留意 null 字串的匹配

選定字段 cut

因行內的每個字段長度不一,使用字元 -c 剪下資料的風險較大,一般偏好使用字段 -f 配合定界符 -d 為基礎來提取資料

連接字段 join

範例:刪除註解、取代分隔符為 whitespace 並排序後合併文件

重新編排字段 awk

  • 程序架構為 pattern {action},pattern 通常是以 / 定界的 ERE,action 為 awk 語句,意思是「如果 pattern 為真,則執行 action」

  • 記得在參數間用逗號分隔,否則字段會連在一起

  • BEGINEND 模式

範例:以自訂分隔符輸出特定欄位

範例:自訂變數、使用 printf 混和文本

深入理解 Nginx 讀書筆記 (第二章)

進程間的關係

  • Nginx 支持僅單進程(master)提供服務
  • 常態的部署是使用一個 master 進程來管理多個 worker 進程
  • Worker 數量與 CPU 核心數相等,進程切換代價最小

使用多進程的好處

  1. master 進程僅專注於純管理工作,為管理員提供命令行服務(啟動、停止、重配置、升級)
  2. master 進程需要比較大的權限,通常會以 root 使用者啟動
  3. 一個 worker 進程出錯後,其他 worker 仍然可以正常服務
  4. 充分利用 SMP(Symmetric multiprocessing) 多核架構,實現微觀上真正的多核併發處理
  5. Worker 通常不會進入睡眠狀態:可以同時處理多個請求,不像 Apache 每個進程只能同時處理一個請求,以致進程切換代價大

配置語法

每個模組都有自己感興趣的配置項,大部分模組都必須在 nginx.conf 中讀取到某個配置後才會啟用,例如只有當配置 http {…} 時, ngx_http_module 模組才會啟用,其他依賴的模組也才能正常使用

區塊配置項

  • 由名稱及一對大括號組成,如 http, server, location 都屬於區塊配置項
  • 傳入的參數取決於解析這個區塊配置項的模組
  • 大括號表示包含其中的配置同時生效
  • 可以嵌套,內層配置直接繼承外層
  • 當內外層配置發生衝突,以哪層配置為準,取決於解析這個區塊配置項的模組,例如範例的 gzip 開關

配置項語法格式

  • 名稱必須合法的(是某個 Nginx 模組想要處理的)
  • 傳入的參數取決於解析這個區塊配置項的模組
  • 若任一參數包含空格符,須要用單引號或雙引號包住
  • 以分號結尾

Read More »深入理解 Nginx 讀書筆記 (第二章)

深入理解 Nginx 讀書筆記 (第一章)

為什麼選擇 Nginx

  1. 更快: 1)單次請求更快響應 2) 在高峰期比其他服務器更快響應
  2. 高擴展性: 1)由耦合度極低模塊組成 2)模塊皆嵌入到2進制文件中執行
  3. 高可靠性: 1)模塊穩定 2)進程相對獨立 3)worker出錯可快速輪替
  4. 低內存消耗: 1)10,000個非活躍 HTTP Keep-Alive 連接僅消耗 2.5 MB
  5. 高併發: 1)單機支援 100,000 以上連接
  6. 熱部署: 1)基於 master 與 worker 進程分離 2)服務不間斷下,進行升級可執行元件、配置及更換日誌
  7. BSD 許可協議

開發準備工作

必要

  1. Linux 內核版本 2.6 以上 (須靠 epoll 處理高併發)
  2. GCC 編譯器編譯 C 語言

非必要

  1. G++,用來編譯 C++ 以編寫 HTTP 模塊
  2. PCRE(Perl 兼容正則表達式),用來在配置文件中使用正則表達式,pcre-devel 是使用 PCRE 做二次開發所需
  3. zlib, 用來對 HTTP 內容做 gzip 壓縮,減少網路傳輸量
  4. OpenSSL,支持 SSL 協議,或想使用 MD5 或 SHA 雜湊

目錄結構

  1. 源代碼目錄
  2. 編譯中間文件(置於源碼目錄底下,命名為objs)
  3. 部署目錄(莫認為 /usr/local/nginx)
  4. 日誌目錄

Linux 內核參數優化

  1. 須要修改內核參數,使得 Nginx 可以擁有更高的性能
  2. 通常根據業務特性進行調整,作為內容服務器、反向代理,或是提供縮圖用的服務器,會做不同調整

Read More »深入理解 Nginx 讀書筆記 (第一章)

HBase Basics

  • Devops

Apache HBase is an open source, scalable, consistent, low latency, random access data store

Source from Infinite Skills

Features

Horizontally Scalable

Linear increase in servers results in linear increases in storage capacity and I/O operations

image

CAP Trade off

In CAP theory, Hbase is more likely a CP type of system

  • Consistency: ACID(atomicity, consistency, isolation, durability) garantees on rows
  • Availability: Response time 2-3ms from cache, 10-20ms from disk
  • Partition Tolerance: Failures don’t block system. It might take longer to response to maintain consistency

Dependencies

Apache ZooKeeper

  • Use for distributed coordination of leaders for high availability
  • Optimized to be highly avaiable for reads
  • Not designed to scale for high write throughput

Apache Hadoop HDFS

  • Provide data durability and reliability
  • Optimized for sequential reads and writes of large files
  • Does not provide random updates, only simple API for rando reads
  • Cannot scale tens of billions of small entities (less then a few hundred MB)

Both system have their strengths but do not individually provide the same properties as HBase

Random Access

Optimized for small random reads

  • Entities indexed for efficient random reads

Optimized for high throughput random writes

  • Updates without requiring read
  • Random writes via Log Structured Merge (LSM)

Short History

Inspired from Google’s Bigtable

Bigtable: A Distributed Storage System for Structured Data(2006)

BigTable

Datastore for Google’s Web Crawl Table

  • Store web page content
  • Web URL as key
  • Use MapReduce to find links and generate backlinks
  • Calculate page rank to build the Google index

Later, it also used as backend for Gmail, GA, Google Earth etc.

Hadoop HDFS

Inspired by Google distributed file system GFS

Timeline

Since 2009, many compaies (Yahoo, Facebook, eBay etc.) chose to use HBase for large scale production use case

In 2015, Google announced BigTable with HBase 1.0 compatible API support for its compute engine users

2017, HBase 2.0.0

2020, HBase 3.0.0

Despite being bucketed into NoSQL category of data storage, some of intresting are moving NoSQL back to SQL, by using HBase as a storage engine for SQL compliant OLTP database system.

Use case

HBase’s strengths are its ability to scale and sustain high write throughputs

Many HBase apps are:

  • Ports from RDBMS to HBase
  • New low-latency big data apps

How to Porting RDBMS to HBase?

  • Many RDBMS are painful to scale
  • Scale up is no longer pratical for massive data
  • Data inconsistency was not acceptable when scaling reads
  • Operationally gets more complicated as the number of replicas increases
  • Operational techniques not sufficient when scaling writes

To make it easier to scale, we need to discard the fundamental features that RDBMS provides, such as:

  • text search (LIKE)
  • joins
  • foreign keys and avoid constraint checks

Changing the schema, make it only contains denormalized tables, we won’t incur replication IO when sharding the RDBMS

Now you’re relatively straightforward porting RDBMS to HBase

Why choosing HBase instead?

  • When your apps need high wirte and read throughput
  • When you tired of RDMS’s fragile scaling operations

Data Volumes

  • Entity data: information about the current state of a particular persion or thing
  • Event data(or time series data): Records events that are generally spaced over many time intervals

Data volume explods when we need both of them

HBase or Not

Q: Does your app expect new data to be vailable immediately after an update?

  • Yes: Use HBase
    • When data queried, must reflect the most recent values
    • Expect query responses in milliseconds
  • No: No need for HBase

Q: Whether your app analytical or operational?

  • Analytical: Not optimal for HBase
    • Look for large set of data
    • Often filter for particular time range
    • Better choose Hadoop
  • Operational: Use HBase
    • Look for single or small set of entities

Q: Does your app expect updates to be available immediately after an update?

  • Yes: Use HBase
    • Frequently modified
    • Pinpoint deletes
    • Updates must be reflected within milliseconds
  • No: No need for HBase
    • Data is append-only
    • Deletes in bulk or never
    • Updates can be ignored until the next report is run

comparison

Workload HBase Hadoop
Low Latency 1ms from cache 10ms from disk 1min vis MR/Spark 1s via Impala
Random Read Rowkey is primary index The small file problem
Short Scan Sorted and efficient Bespoke partitioning can help
Full Scan Possible but non-optimal Improved pref w/MR on snapshots Optimized with MR, Hive, Impala
Updates Optimized Not supported

Read More »HBase Basics

Kubernetes Short Notes(4)

  • Devops

Storage

Persistent Volume

Except storing a volume on the host, Kubernetes provide serveral type of storage solutions

  • NFS
  • GlusterFS
  • Flocker
  • Ceph
  • ScaleIO
  • AWS EBS
  • Azure Disk
  • Google Persistent Disk

Persistent Volume Claim

Administrators creates PV, and users creates PVC to use the PV, during the binding process Kubernetes tries to find the PV that has sufficient capacity as requested by the claim and any other request properties such as access modes, volume modes, storage class and selector

Note that a smaller claim may get bound to a larger volume if all the other criteria matches and there are no better options

There is a one to one relationship bewteen PV and PVC, no other claims can utilize the remaining capacity in the volume

Configure the field persistentVolumeReclaimPolicy to defined what action to perform to PV after a PVC deleted.

  • Retain (default)
  • Delete
  • Recycle

Networking

Networking for Linux Basics

Network Switch

A switch is a device in a computer network that connects other devices together, can only enable a communication within a network

Host A(192.168.1.10)[eth0] ↔ Switch(192.168.1.0) ↔ [eth0]Host B(192.168.1.11)

Router

A router is a device/service that provides the function of routing IP packets between networks

Switch(192.168.1.0) <–> [192.168.1.1]Router[192.168.2.1] <–> Switch(192.168.2.0)

Route/Gateway

A gateway (in network terms) is a router that describes the function for connectivity

Default Gateway

If none of these forwarding rules in the routing table is appropriate for a given destination address, the default gateway is chosen as the default router of last resort

Forwording packets between interfaces

By default in linux, packets are not forwarded from one interface to the next, for security reasons

Explicity allow it

Persists the settings

DNS

Translate host name to IP address by configure the /etc/hosts

When a environment has too many entries and IP address are not persistent, we need a DNS server

The host will lookup an entry in /etc/hosts first, then lookup in the DNS. This order can be changed by configure file /etc/nsswitch.conf

You can configure the DNS server to forward unknown host name to the public name server in the Internet, for example reach www.google.com

private DNS → Root DNS → .com DNS → google DNS → cache the result

When looking for a host in the same domain, we want to simple use the host name not the full name, such as using web not web.mycompany.com, therefore we specify the domain name you want to append in /etc/resolv.conf

There are records stores in DNS with specific types:

  • A: ipv4
  • AAAA: ipv6
  • CNAME: name to name mapping

You can use tools like nslookup, dig to debug, note that nslookup only query from dns, not files

There are plenty DNS solutions, such as CoreDNS, except configure from files, CoreDNS supports other ways of configuring DNS entries through plugins like kubernetes

Network Namespace

A namespace is a way of scoping a particular set of identifiers

Linux provides namespaces for networking and processes, if a process is running within a process namespace, it can only see and communicate with other processes in the same namespace

Linux starts up with a default network namespace

Each network namespace has its own routing table and has its own set of iptables

Connect namespaces together using a virtual Ethernet pair (or virtual cable, pipe)

When there more of namespaces need connected, use a virtial switch to create a virtial network. There few solutions:

  • Linux Bridge
  • Open vSwitch

image

When a private virtual network need to reach the outer network, it need a gateway, the host is the gateway

For destination network to response, enable NAT on host acting as a gateway.

Add a new rule in the NAT IP table in the POSTROUTING chain to masquerade or replace the from address on all packets coming from the source network 192.168.15.0 with its own IP address.

Thus anyone receiving these packets outside the network will think that they are coming from the host and not from within the namespaces

Add a route using default gateway to outside world

For outside world to reach the namespace in a private network, add a port forwarding rule using IP tables to say any traffic coming to port 80 on the localhost is to be forwarded to port 80 on the IP assigned to the namespace

Read More »Kubernetes Short Notes(4)

Kubernetes Short Notes(3)

  • Devops

Cluster Maintainance

OS Upgrade

Pod Eviction Timeout

When the nodes was down for more than 5 minute(default) then the pods are terminated; pod will recreate if has replicaset

Drain, Cordon, Uncordon

We’re not sure the node will come back online in 5 minutes, therefore we can drain the node.

After the drained node upgraded and come back, it still unschedulable, uncordon the node to make it schedulable.

Note that the previouse pods won’t be automatically reschedule back to the nodes.

Cluster Upgrade

The core control plane components’s version can be different, but should follow certain rules:

  • the kube-api is the primary component, none of the other components’s version must not be higher than the kube-api
  • the components can be lower in 1-2 versions
    • kube-api: x
    • Controlloer-manager, kube-scheduler: x, x-1
    • kubelet, kube-proxy: x, x-1, x-2
  • the kubectl can be one version higher than kube-api: x+1, x, x-1

The kubernetes support only up to the recent 3 minor versions. The recommanded approch is to update one minor version at a time.

Update the cluster depend on how you deploy them:

  • cloud provider: few clicks at the UI
  • kubeadm: using upgrade argument (you should upgrade the kubeadm first!)
  • the hard way from scratch: manually upgrade components by yourself

Two major steps:

  1. upgrade master node, the control plane componets goes down, all management function are down, only the applications deploy on worker nodes keeps serving
  2. update worker nodes, with strategies:
    • upgrade all at once with downtimes
    • upgrade one at a time
    • create new nodes and remove the workloads, then finally remove old nodes

When you run command like kubectl get nodes, the VERSION is indicat the version of the kubelet

Backup and Restore

Master / Node DR

  • Cordon & drain
  • Provision replacement master / node

ETCD DR

Option: Backup resources

Saving objects as a copy by query the kube-api

Option: Backup ETCD

Making copies of the ETCD data directory

Or use the etcd command line tool

  1. Make a snap shot

    Remember to specify the certification files for authentication
  2. Stop kube-api
  3. Restore snapshot

    When ETCD restore from a backup, it initialize a new cluster configuration and configures the members of ETCD as new members to a new cluster. This is to prevent a new member from accidentally joining an existing cluster.
    For example, using a snapshot to provision a new etcd-cluster from testing purpose. You don’t want the members in the new test cluster to accidentally join the production cluster.

  4. Configure the etcd.service with new data directory and new cluster token

    During a restore, you must provide a new cluster token and the same initial cluster configuration

  5. Restart ETCD service
  6. Start kube-api

Persistant Volume DR

You can’t relay on kubernetes to for backing up and restore persistant volumes.

If you’re using cloud provider specific persistant volumes like EBS volumes, Azure managed disks or GCE persistent disks, you should use cloud provider snapshot APIs

Read More »Kubernetes Short Notes(3)