TOC
- 圖論 Graph Theory
- 廣度優先搜尋 Breadth-first Search, BFS
- 深度優先搜尋 Depth-first Search, DFS
- 最短路徑演算法 Shortest Path
圖論 Graph Theory
G(V, E)
- 由頂點(vertex, node)和連接頂點的邊(Edge)關聯成的圖形
- 其中關聯分作有方向性(directed)及無方向性(undirected)
- 須考慮最大流問題(maximum flow)
在程式語言有幾種方式來建構Graphs:
相鄰矩陣 adjacency matrixes
陣列 edge list representation
1 2 3 4 5 6 7 8 9 10 11 |
class Node: def __init__(self, name): self.name = name self.visited = False self.neighbors = [] # Nodes self.predecessor = None # use in shortest path def __repr__(self): return 'Node(name={})'.format(self.name) |
應用
- 最短路徑演算法 Shortest Path Algorithm: GPS, 高頻交易
- 生成樹協定 Spanning Tree Protocol, STP
- 爬蟲
廣度優先搜尋 Breadth-first Search, BFS
- 時間複雜度:
O(V+E)
(分別遍歷所有節點和各節點的所有鄰居) - 空間複雜度:
O(V)
(Queue中最多可能存放所有節點) - 用於有效率的迭代(traversal)
- 迭代的方式為鄰近的先訪問(level-ordered)
- 劣勢在於儲存指標記憶體空間,有時用DFS替代
- 使用FIFO的Queue來實作,Queue為空代表完成迭代
應用
- machine learning
- Edmonds-Karp演算法
- Cheney演算法
以Python實作,輸出請參考gist
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
class BFS: """ For BFS, use queue; For DFS, use stack or recursion """ def __init__(self, start): self.queue = [] self.start = start def traversal(self): self.start.visited = True self.queue.append(self.start) while self.queue: # O(V) node = self.queue.pop(0) yield node for n in node.neighbors: # O(E) if not n.visited: n.visited = True self.queue.append(n) |
深度優先搜尋 Depth-first Search, DFS
- 時間複雜度:
O(V+E)
- 空間複雜度:
O(logV)
(訪問至末端節點後LIFO,Stack最多只會同時存在logV個節點,也就是樹的高度) - 為了解Maze Problem而生的演算法
- 效能比BFS稍佳
- 使用LIFO的Stack來實作
應用
- 拓撲排序 Topological Order
- Kosaraju演算法: 推薦系統 Recommand System
- 判斷Grapth是否有cycle(DAG)
- Maze
以Python實作,輸出請參考gist
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
class DFS: """ For BFS, use queue; For DFS, use stack / recursion(os stack) """ def __init__(self, start): self.start = start def traversal(self): interface = self.stack() interface(self.start) return self.result def stack(self): self.result = [] def interface(node): self.result.append(node) node.visited = True for n in node.neighbors: if not n.visited: interface(n) return interface |
最短路徑演算法 Shortest Path
- 主要用於找出Graph中兩節點之間的最短路徑
- 已知起點或終點,求最短的遍歷:
Dijkstra
、Bellman-Ford
- 求某起點到某終點的最短路徑:
A* search
- 找全局最短路徑:
Floyd-Warshall
- 已知起點或終點,求最短的遍歷:
最短路徑的應用
- GPS
- Routing Imformation Protocal, RIP
- Avidan-Shamir method
先實作以下Dijkstra
、Bellman-Ford
演算法會用到的類別:Node、Edge
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
class Edge: def __init__(self, weight, start, target): self.weight = weight self.start = start self.target = target if self not in start.edges: start.edges.append(self) def __repr__(self): return 'Edge(weight={0}, start={1}, target={2})'.format( self.weight, self.start, self.target ) class Node: def __init__(self, name): self.name = name self.visted = False self.predecessor = None self.edges = [] # Edges self.min_cost = sys.maxsize def __repr__(self): return 'Node(name={})'.format(self.name) def __cmp__(self, other): return self.cmp(self.min_cost, other.min_cost) def __lt__(self, other): return self.min_cost < other.min_cost |
Dijkstra
- 使用Priority Queue(Heap)能達到時間複雜度:
O(V*logV + E)
- 通常用Max binary heap、Fibonacci heap、Priority queue來實現
- 是一種貪婪式演算法(都以目前的最短路徑去計算cost),同時可獲得最佳解
- 邊的權值一定要為正值(positive weight edge)
- relax a single edge at the same time
Pseudo code請參考gist
以Python實作,輸出請參考gist
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
class Dijkstra: def __init__(self, start): self.heap = [] start.min_cost = 0 heapq.heappush(self.heap, start) self.count_cost() def count_cost(self): while self.heap: node = heapq.heappop(self.heap) for edge in node.edges: cost = edge.start.min_cost + edge.weight if cost < edge.target.min_cost: edge.target.predecessor = edge.start edge.target.min_cost = cost heapq.heappush(self.heap, edge.target) def get_shortest_path(self, target): node = target path = [] while node is not None: path.append(node) node = node.predecessor return list(reversed(path)) |
Bellman-Ford
- 時間複雜度:
O(V*E)
- 效能比Dijkstra差,但能處理負邊(negtive weight edge)
- relax all edges at the same time for V-1 iteration
- 迭代針對每個邊做負環(negtive cycle, path的cost小於0)的檢查
應用
- cycle detection
- argitrage situations
Pseudo code請參考gist
以Python實作,輸出及負環測試請參考gist
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
class BellmanFord: def __init__(self, nodes, edges, start): start.min_cost = 0 self.nodes = nodes self.edges = edges self.start = start self.has_cycle = False self.count_cost() def count_cost(self): for i in range(len(self.nodes) - 1): for edge in self.edges: cost = edge.start.min_cost + edge.weight if edge.target.min_cost > cost: edge.target.min_cost = cost edge.target.predecessor = edge.start for edge in self.edges: if self.check_cycle(edge): self.has_cycle = True @staticmethod def check_cycle(edge): return (edge.start.min_cost + edge.weight) < edge.target.min_cost def get_shortest_path(self, target): if self.has_cycle: raise NotImplementedError('Negtive Cycle Detected!') node = target path = [] while node is not None: path.append(node) node = node.predecessor return list(reversed(path)) |
有向無環圖 Directed Acyclic Graph, DAG
- 沒有環,相較於有環圖能更快找到最短路徑
- 藉由拓樸排序(Topological),找到最短路徑的時間複雜度:
O(E+V)
- 正負邊皆能處理
- 效能比
Dijkstra
、Bellman-Ford
都好
應用
- 解決背包問題(Kanpsack-problem)
生成樹 Spanning Tree
- 從Graph取出一棵樹,可能有多種組合
- 樹的權重:所有邊的權重總和
- 最小生成樹:圖中權重最小的生成樹
生成樹的應用
- Big Data
- Clustering
Kruskal
- 貪婪式演算法
- 使用disjoint set或heap實作
- 時間複雜度:
O(N*logN)
(迭代時檢查cycle),最差的結果是O(E*logE)
以Python實作,Edge、Node、Disjoint Set類別實作與輸出請參考gist
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
class Kruskal: def __init__(self, nodes, edges): self.spanning_tree = [] self.edges = edges self.edges.sort() # O(NlogN) self.sets = [] self.nodes = nodes for node in nodes: self.sets.append(DisjointSet(node)) def run(self): for edge in self.edges: r1 = DisjointSet.find(edge.start) r2 = DisjointSet.find(edge.target) if r1.set_ is not r2.set_: # if in different set DisjointSet.merge(r1.set_, r2.set_) self.spanning_tree.append(edge) if len(self.spanning_tree) == len(self.nodes)-1: # If we have selected n-1 edges, all the other # edges will be discarted, so, we can stop here break |
keon/algorithms/graph/minimumspanningtree.py有更簡易的實作
排序演算法 Sorting Algorithms
- 比較排序(comparision)
bubble sort
、insertion sort
、selection sort
、merge sort
、quicksort
- lower bound: 最少要付出
O(NlogN)
來排序
- 非比較排序(non-comparision)
radix sort
、bucket sort
- lower bound最佳可至
O(N)
- 適應性(Adaptive)排序:
- 運用input已有的順序,節省比較次數
O(NlogN)
~O(N)
- 根據記憶體的使用分為:
- in-place
- stable: 當資料中有相等數值的兩元素,經過排序之後是否能夠保持原有的相對位置
- recursive
主要的精神為比較、對調:
1 2 3 |
if arr[i] > arr[j]: swap(i, j) |
冒泡排序 Bubble Sort
- 平均時間複雜度:
O(N^2)
- stable、in-place
1 2 3 4 5 6 7 8 9 10 |
def bubble_sort(arr): def swap(x, y): arr[x], arr[y] = arr[y], arr[x] iter_time = len(arr)-1 for i in range(iter_time): for j in range(0, iter_time-i): if arr[j] > arr[j+1]: swap(j, j+1) |
選擇排序 Selection Sort
- 平均時間複雜度:
O(N^2)
- 適用長度小的陣列(長度10~20),效能甚至比quicksort、mergesort佳
- 寫入次數較插入排序要少
- in-place
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
def selection_sort(arr): def swap(x, y): arr[x], arr[y] = arr[y], arr[x] for i in range(len(arr) - 1): index = i for j in range(i+1, len(arr)): if arr[j] < arr[index]: index = j if index != i: swap(index, i) return arr |
插入排序 Insertion Sort
- 平均時間複雜度:
O(N^2)
- 適用長度小的陣列(長度10~20),效能甚至比quicksort、mergesort佳
- 寫入次數較選擇排序多
- in-place、stable
1 2 3 4 5 6 7 8 9 10 11 12 |
def insertion_sort(arr): def swap(x, y): arr[x], arr[y] = arr[y], arr[x] for i in range(len(arr)): j = i while j > 0 and arr[j-1] > arr[j]: swap(j, j-1) j -= 1 return arr |
快速排序 Quick Sort
- 屬分治法(divide and conquer algorithm),可以由遞迴來實作
- 時間複雜度:
O(N^2)
~O(NlogN)
- in-place
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
def quick_sort(arr): arr = quick_sort_recur(arr, 0, len(arr)-1) return arr def quick_sort_recur(arr, first, last): if first < last: flag = partition(arr, first, last) # Start our two recursive calls quick_sort_recur(arr, first, flag-1) quick_sort_recur(arr, flag+1, last) return arr def partition(arr, first, last): def swap(x, y): if x != y: arr[x], arr[y] = arr[y], arr[x] flag = first for i in range(first, last): if arr[i] < arr[last]: # last is the pivot swap(flag, i) flag += 1 swap(flag, last) return flag |
合併排序 Merge Sort
- 屬分治法(divide and conquer algorithm),可以由遞迴來實作
- 時間複雜度:
O(NlogN)
- 空間複雜度:
O(N)
,merge時需要配置額外記憶體 - stable
- 適合用來排序linked list
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
def merge_sort(arr): def merge(left, right, merged): lc, rc = 0, 0 # left cursor, right cursor while lc < len(left) and rc < len(right): if left[lc] <= right[rc]: merged[lc+rc] = left[lc] lc += 1 else: merged[lc+rc] = right[rc] rc += 1 # Add left overs for i in range(lc, len(left)): merged[i+rc] = left[i] for i in range(lc, len(right)): merged[lc+i] = right[i] return merged if len(arr) <= 1: return arr # divide mid = len(arr) // 2 left = merge_sort(arr[:mid]) right = merge_sort(arr[mid:]) # conquer return merge(left, right, arr.copy()) |
堆積排序 Heap Sort
請參考Heap
混合式算法 Hybrid algorithms
- 旨在解決同一個問題,但提昇整體演算法效能
Time Complexity | Memory | Stable | |
---|---|---|---|
Quicksort | quadratic time O(N^2)~O(NlogN) | in-place, O(logN)~O(N) | No |
Heapsort | O(NlogN) | in-place, O(1) | No |
Merge sort | O(NlogN) | O(1)~O(N) | Yes |
內省排序 Intro Sort(used by Python)
- 當遞迴深度超過一定深度由quicksort切換至heapsort
Tim Sort(used by Python)
- 當超過一定長度由insertion sort切換至merge sort
Non-comparision based sorting
- 時間複雜度最佳能至
O(N)
計數排序 Counting Sort
- 只適用於排序整數
- 時間複雜度:
O(N+k)
,k=max-min+1 - 所需的長度取決於資料範圍k,範圍大會佔用許多記憶體
基數排序 Radix Sort
- 時間複雜度:
O(k*N)
,k=digit length of the number - stable
- 依排序順序分成LSD(least-significant-digit-first)及MSD(most-significant-digit-first)