行過死蔭之地

  • Quote

「牢中生活不見得能感化他們,但肯定會加強他們的犯罪技巧。」

人生啊!曾有人說過,對於那些靠思考過日子的人像是一齣喜劇,對於那些憑感覺過日子的來說卻是一場悲劇。對我而言,無論怎麼過日子都是有喜有悲,即使你什麼都不做也逃不掉。

深入理解 Nginx 讀書筆記 (第二章)

進程間的關係

  • Nginx 支持僅單進程(master)提供服務
  • 常態的部署是使用一個 master 進程來管理多個 worker 進程
  • Worker 數量與 CPU 核心數相等,進程切換代價最小

使用多進程的好處

  1. master 進程僅專注於純管理工作,為管理員提供命令行服務(啟動、停止、重配置、升級)
  2. master 進程需要比較大的權限,通常會以 root 使用者啟動
  3. 一個 worker 進程出錯後,其他 worker 仍然可以正常服務
  4. 充分利用 SMP(Symmetric multiprocessing) 多核架構,實現微觀上真正的多核併發處理
  5. Worker 通常不會進入睡眠狀態:可以同時處理多個請求,不像 Apache 每個進程只能同時處理一個請求,以致進程切換代價大

配置語法

每個模組都有自己感興趣的配置項,大部分模組都必須在 nginx.conf 中讀取到某個配置後才會啟用,例如只有當配置 http {…} 時, ngx_http_module 模組才會啟用,其他依賴的模組也才能正常使用

區塊配置項

  • 由名稱及一對大括號組成,如 http, server, location 都屬於區塊配置項
  • 傳入的參數取決於解析這個區塊配置項的模組
  • 大括號表示包含其中的配置同時生效
  • 可以嵌套,內層配置直接繼承外層
  • 當內外層配置發生衝突,以哪層配置為準,取決於解析這個區塊配置項的模組,例如範例的 gzip 開關

配置項語法格式

  • 名稱必須合法的(是某個 Nginx 模組想要處理的)
  • 傳入的參數取決於解析這個區塊配置項的模組
  • 若任一參數包含空格符,須要用單引號或雙引號包住
  • 以分號結尾

Read More »深入理解 Nginx 讀書筆記 (第二章)

深入理解 Nginx 讀書筆記 (第一章)

為什麼選擇 Nginx

  1. 更快: 1)單次請求更快響應 2) 在高峰期比其他服務器更快響應
  2. 高擴展性: 1)由耦合度極低模塊組成 2)模塊皆嵌入到2進制文件中執行
  3. 高可靠性: 1)模塊穩定 2)進程相對獨立 3)worker出錯可快速輪替
  4. 低內存消耗: 1)10,000個非活躍 HTTP Keep-Alive 連接僅消耗 2.5 MB
  5. 高併發: 1)單機支援 100,000 以上連接
  6. 熱部署: 1)基於 master 與 worker 進程分離 2)服務不間斷下,進行升級可執行元件、配置及更換日誌
  7. BSD 許可協議

開發準備工作

必要

  1. Linux 內核版本 2.6 以上 (須靠 epoll 處理高併發)
  2. GCC 編譯器編譯 C 語言

非必要

  1. G++,用來編譯 C++ 以編寫 HTTP 模塊
  2. PCRE(Perl 兼容正則表達式),用來在配置文件中使用正則表達式,pcre-devel 是使用 PCRE 做二次開發所需
  3. zlib, 用來對 HTTP 內容做 gzip 壓縮,減少網路傳輸量
  4. OpenSSL,支持 SSL 協議,或想使用 MD5 或 SHA 雜湊

目錄結構

  1. 源代碼目錄
  2. 編譯中間文件(置於源碼目錄底下,命名為objs)
  3. 部署目錄(莫認為 /usr/local/nginx)
  4. 日誌目錄

Linux 內核參數優化

  1. 須要修改內核參數,使得 Nginx 可以擁有更高的性能
  2. 通常根據業務特性進行調整,作為內容服務器、反向代理,或是提供縮圖用的服務器,會做不同調整

Read More »深入理解 Nginx 讀書筆記 (第一章)

Rabbitmq Exchange and its Types

Definition

exchange

Message routing agents.

Messages are not published directly to a queue. Instead, the publisher sends messages to an exchange.

Defined by the virtual host within RabbitMQ.

Predefined default exchanges are created at server starts.

Clients can create their own exchanges.

Exchange parameters:

  • type
  • name
  • durability
  • auto-delete (once last bound object is unbound from the exchange)

binding

Link between an exchange and a queue.

Use message attributes: routing key, header to route to the messages to queue(s).

Exchange Types

default exchange

No name. (usually referred by an empty string)

Every queue is automatically bound to the default exchange.

Binding key will be the queue name.

direct exchange

Similar to default exchange, but got a name and the bindings are not created automatically.

A message goes to the queue(s) with the binding key that exactly matches the routing key of the message.

fanout exchange

Routing key doesn’t have any effect.

Route message to all queues.

topic exchange

Similar to direct exchange, but use binding pattern not binding key.

A message goes to the queue(s) with the binding pattern that matches the routing key of the message.

image

header exchange

Similar to direct exchange, but use message header not routing key; use binding header not binding key

If x-match=all, then all header attributes should match

If x-match=any, then any header attributes match counts

image

緩慢

  • Quote

跑步的人與摩托車騎士全然不同,他始終存在自己的身體之中,所以不得不時時刻刻想到腳上的水泡,想到自己氣喘吁吁;跑步的時候,他感覺到自己的體重,自己的年紀,他筆任何時候都更清楚地意識到自己,意識到屬於他生命的時間。當人把速度的能力交付給一台機器後,一切都變了,從此,他自己的身體就出局了,他投身一種非身體性、非物質性的速度,那是一種純粹的速度,為自身而存在的速度,狂迷的速度。

要佔據舞台,就得把別人擠下去,這就得要有某種特殊的戰技了。舞者的戰鬥,彭特凡稱之為道德柔道。舞者向全世界下戰帖:有誰能比他表現得更道德(更勇敢,更誠實,更真心,更願意犧牲奉獻,更實話實說)?他會使出渾身解數,讓別人在道德上居於劣勢。

談話不只是為了填滿時間,相反的,是談話在分配時間,控制時間,並且強迫時間遵守它規定的法則。

緩慢的程度與記憶的強度成正比;快速的程度與遺忘的強度成正比。

存在所擁有的每一種新的可能性——就算機率微乎其微——都會徹底改變存在。

「選民」是一種神學概念,意思是:一個人不是因為有什麼功績,而是經由一種超自然的判決,套過上帝的自由意志,甚至是祂隨心所欲的意志,被選去做某件特別例外又超乎尋常的是。聖徒們正是在這樣的信念中汲取力量,才能承受最殘暴的酷刑。種種神學概念,一如這些概念的仿效心理,投射在我們平凡無奇的生活裡;我們每個人(或多或少)都因為生活太過平凡而感到卑微,都渴望擺脫這種生活,渴望自我提昇。我們每個人都有過這種幻覺(或強或弱),認為自己配得上某種提昇,認為自己是注定、是被選去獲得這種提昇的。

「你的夢就像字紙簍,我把寫太蠢的那幾頁丟在裡頭。」

我們講述當代歷史的方式就像是一場大型演唱會,我們在其中連續表演一百三十八號貝多芬的作品,但是每一號作品都只演奏最前面的八個小節。十年之後,如果我們再辦同樣的音樂會,我們只會演奏每一號作品的第一個音符,也就是說,整場音樂會就是一百三十八個音符,讓人當作一段旋律表演出來。而二十年後,貝多芬的所有音樂將會化約為一個極長的高音,就像他聾掉那天聽到的唯一一個音符,極其尖銳,無窮無盡。

我們的時代為遺忘的速度著迷,為了滿足這個欲望,這個時代沉緬於膜拜速度的惡靈。時代加快腳步,因為它要讓人們明白,它不想再被人想起;它對自己感到厭倦;它對自己產生反感;它想要吹滅顫顫巍巍的記憶之火。

求求你,朋友。我隱約感覺到,我們唯一的希望就是寄託在你快樂的能力之中。

HBase Basics

  • Devops

Apache HBase is an open source, scalable, consistent, low latency, random access data store

Source from Infinite Skills

Features

Horizontally Scalable

Linear increase in servers results in linear increases in storage capacity and I/O operations

image

CAP Trade off

In CAP theory, Hbase is more likely a CP type of system

  • Consistency: ACID(atomicity, consistency, isolation, durability) garantees on rows
  • Availability: Response time 2-3ms from cache, 10-20ms from disk
  • Partition Tolerance: Failures don’t block system. It might take longer to response to maintain consistency

Dependencies

Apache ZooKeeper

  • Use for distributed coordination of leaders for high availability
  • Optimized to be highly avaiable for reads
  • Not designed to scale for high write throughput

Apache Hadoop HDFS

  • Provide data durability and reliability
  • Optimized for sequential reads and writes of large files
  • Does not provide random updates, only simple API for rando reads
  • Cannot scale tens of billions of small entities (less then a few hundred MB)

Both system have their strengths but do not individually provide the same properties as HBase

Random Access

Optimized for small random reads

  • Entities indexed for efficient random reads

Optimized for high throughput random writes

  • Updates without requiring read
  • Random writes via Log Structured Merge (LSM)

Short History

Inspired from Google’s Bigtable

Bigtable: A Distributed Storage System for Structured Data(2006)

BigTable

Datastore for Google’s Web Crawl Table

  • Store web page content
  • Web URL as key
  • Use MapReduce to find links and generate backlinks
  • Calculate page rank to build the Google index

Later, it also used as backend for Gmail, GA, Google Earth etc.

Hadoop HDFS

Inspired by Google distributed file system GFS

Timeline

Since 2009, many compaies (Yahoo, Facebook, eBay etc.) chose to use HBase for large scale production use case

In 2015, Google announced BigTable with HBase 1.0 compatible API support for its compute engine users

2017, HBase 2.0.0

2020, HBase 3.0.0

Despite being bucketed into NoSQL category of data storage, some of intresting are moving NoSQL back to SQL, by using HBase as a storage engine for SQL compliant OLTP database system.

Use case

HBase’s strengths are its ability to scale and sustain high write throughputs

Many HBase apps are:

  • Ports from RDBMS to HBase
  • New low-latency big data apps

How to Porting RDBMS to HBase?

  • Many RDBMS are painful to scale
  • Scale up is no longer pratical for massive data
  • Data inconsistency was not acceptable when scaling reads
  • Operationally gets more complicated as the number of replicas increases
  • Operational techniques not sufficient when scaling writes

To make it easier to scale, we need to discard the fundamental features that RDBMS provides, such as:

  • text search (LIKE)
  • joins
  • foreign keys and avoid constraint checks

Changing the schema, make it only contains denormalized tables, we won’t incur replication IO when sharding the RDBMS

Now you’re relatively straightforward porting RDBMS to HBase

Why choosing HBase instead?

  • When your apps need high wirte and read throughput
  • When you tired of RDMS’s fragile scaling operations

Data Volumes

  • Entity data: information about the current state of a particular persion or thing
  • Event data(or time series data): Records events that are generally spaced over many time intervals

Data volume explods when we need both of them

HBase or Not

Q: Does your app expect new data to be vailable immediately after an update?

  • Yes: Use HBase
    • When data queried, must reflect the most recent values
    • Expect query responses in milliseconds
  • No: No need for HBase

Q: Whether your app analytical or operational?

  • Analytical: Not optimal for HBase
    • Look for large set of data
    • Often filter for particular time range
    • Better choose Hadoop
  • Operational: Use HBase
    • Look for single or small set of entities

Q: Does your app expect updates to be available immediately after an update?

  • Yes: Use HBase
    • Frequently modified
    • Pinpoint deletes
    • Updates must be reflected within milliseconds
  • No: No need for HBase
    • Data is append-only
    • Deletes in bulk or never
    • Updates can be ignored until the next report is run

comparison

Workload HBase Hadoop
Low Latency 1ms from cache 10ms from disk 1min vis MR/Spark 1s via Impala
Random Read Rowkey is primary index The small file problem
Short Scan Sorted and efficient Bespoke partitioning can help
Full Scan Possible but non-optimal Improved pref w/MR on snapshots Optimized with MR, Hive, Impala
Updates Optimized Not supported

Read More »HBase Basics