算法数理工学第10回定兼　邦彦.

算法数理工学第10回定兼　邦彦

最大流問題とフロー増加法各枝に容量が与えられたネットワーク G = (V,E) において，ソース s  V からシンク t  V に向かってできるだけ多くのものを送ることを考える枝 (i,j) 上の流れの大きさを xij，枝 (i,j) の容量，つまり流れ xij の上限値を uij，ソースからシンクへの総流量を f とすると，問題は次のような線形計画問題として定式化できる 1 2 3 4 5 8 ソースシンク

流れ保存則容量制約条件制約条件1: s からの正味の流出量が f 制約条件3: t への正味の流入量が f
制約条件：流れ保存則制約条件1: s からの正味の流出量が f 制約条件3: t への正味の流入量が f 制約条件2: s と t 以外の節点では流入量 = 流出量容量制約条件制約条件4: 各枝の流量が非負かつ枝の容量以下

流れ保存則と容量制約条件を満たす x = (xij) をフローと呼びそれに対する f をフローの流量と呼ぶ
最大流問題は，流量が最大になるフローを求める問題最大流問題は線形計画問題なので単体法などで解くことができるが，ここではネットワーク問題の特性を利用したアルゴリズムを考える 1 2 3 4 5 8 ソースシンク

残余容量と残余ネットワークネットワークにおいて，任意の2節点 i, j  V の間に枝(i,j)と(j,i)の両方が存在することはないと仮定あるフロー x = (xij) が与えられているとする．ネットワーク G = (V,E) の各枝 (i,j)  E を容量を持つ枝 (i,j) と，容量を持つ枝 (j,i) で置き換えるなら枝 (j,i) のみ，なら枝 (i,j) のみ i j

をフロー x に対する枝 (i,j) の残余容量という
ネットワーク G の各枝 (i,j) を上のように置き換えたネットワークを，フロー x に対する残余ネットワークと言い，Gx = (V,Ex) と書く元のネットワーク G 1 2 3 4 5 8 残余ネットワーク Gx 1 2 3 4 5 6 フロー x 1 2 3 4 5 (3) (4) (2) (1) (0) (6)

そのような路を，(現在のフロー x に対する) フロー増加路と呼ぶ
残余ネットワークにおいて，ソースからシンクへの路が存在すれば，その路に沿ってフローを追加することによって，元のネットワーク上での流量 f を増やせることがわかるそのような路を，(現在のフロー x に対する) フロー増加路と呼ぶフロー増加路に沿って追加できるフローの量は，その路に含まれる枝の残余容量の最小値に等しい残余ネットワーク Gx 1 2 3 4 5 6

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 (1) (3) (0) (2) (1) (4) (6) (4) (2) (3)
フロー x 1 2 3 4 5 (3) (4) (2) (1) (0) (6) 新しいフロー x 1 2 3 4 5 (4) (3) (1) (2) (6) 残余ネットワーク Gx フロー増加路 (流量1) 1 2 3 4 5 6

フロー増加法最大流を求めるアルゴリズム (正当性の証明は後)
(0) 適当な初期フロー x を求める例えば，全ての枝 (i,j) に対し xij = 0 とする残余ネットワーク Gx においてソースからシンクへの路 (フロー増加路) を見つける存在しなければ計算終了フロー増加路に沿って可能な限りフローを追加し，新しいフロー x を得る．ステップ(1)に戻る

初期フロー残余ネットワーク 1 2 3 4 5 (0) 1 2 4 5 3 3 5 1 5 4 3 8 フロー残余ネットワーク 1 2 3 4 5 (0) (4) 1 2 4 5 3 3 5 1 4 5 4 3 4

フロー残余ネットワーク 1 2 3 4 5 (3) (4) (0) (7) 1 2 4 2 3 3 3 5 1 1 5 4 3 7 フロー残余ネットワーク 1 2 3 4 5 (4) (3) (1) (0) (7) 1 2 4 1 2 3 5 4 1 1 1 5 4 3 7 フロー増加路が存在しないので終了フローの流量は 8

フロー増加法の正当性と最大流最小カット定理
枝の容量が全て整数なら，初期フローを整数値としてフロー増加法を用いると，ソースからシンクへの流量は1回の反復で少なくとも1は増えるよって有限回の反復の後にアルゴリズムは終了終了した時点でのフローが最大流になっていることを示す

頂点集合V をソースs を含む集合Sとシンクt を含む集合T に分割したものをカットと言い，(S,T) と表す
任意のカット (S,T) に対して，S の節点を始点とし， T の節点を終点とする枝を (i,j)  (S,T)，逆に， T の節点を始点とし，S の節点を終点とする枝を (j,i)  (T,S) と書く全ての枝 (i,j)  (S,T) の容量 uij の合計をカット (S,T) の容量と呼び，C(S,T) で表す．つまり

x を任意のフロー，f をその流量，(S,T) を任意のカットとする
が成立する全ての枝 (i,j) E に対して 0  xij  uij であるから，カット容量の定義より，f  C(S,T) を得る

フロー増加法の計算が終了した時点で得られているフローを x*，その流量を f * とする
残余ネットワークでs から到達できる点の集合をS*, その補集合を T* とする s  S* かつ t  T* より，(S*,T*) はカットとなる残余ネットワークの定義より

従って，任意のフローに対して f  C(S,T)が成り立つので，上の式は x* が実行可能な全てのフローの中で最大流量を与えるものであり，同時に，(S*,T*) が全てのカットの中で最小の容量をもつものであることを示している (最大流最小カットの定理) 以上より，フロー増加法が終了したときのフローは最大流になっている

フロー増加法の計算量とその改良容量が整数値のとき，フロー増加法では1回の反復で少なくとも1単位の流量が追加される
　⇒ 全体の反復回数は最大流量の値を越えない枝の容量の最大値を U = max{uij| (i,j)  E} とすると，最大流量の上限は mU 1つのフロー増加路を見つける計算量は O(m) 全体の計算量は O(m2U) 計算量に U が現れるので，理論的には多項式時間アルゴリズムとは言えない

Edmonds-Karpアルゴリズムフロー増加法において，フロー増加路を求める際に辺数最小のものを求める
幅優先探索で路を求めればいい反復回数が mn/2 以下になる (後述) 全体の時間計算量は O(m2n)

i 回目の反復後のフローを fi とする i+1 回目の反復では，残余ネットワークで辺数最小のフロー増加路 Pi を求め，フロー fi+1を流す補題：フローの系列 f1, f2,... に対し (a) 全ての k で |E(Pk)|  |E(Pk+1)| (b) Pk  Pl が逆辺対を持つような全ての k < l に対して， |E(Pk)| + 2  |E(Pl)|

証明：(a) Pk  Pk+1 から逆辺対を全て除去したグラフを G1 とする Pk と Pk+1 の両方に現れる辺は多重辺とする
G1 には2つの辺素な s-t パス Q1, Q2 が存在するの任意の辺はPk の辺の逆辺となるの辺は Pk にそってフローを流したときにできたので，それは流量が 0 のところにフロー Pk を流したあとの残余ネットワークの辺であるから s t Pk Pk+1 s t Q1 Q2 G1

G1 の枝は Pk  Pk+1 に含まれるが，後者の枝の中での枝は Pk の逆辺なので削除されている．よって
Q1, Q2 の枝は G1 に含まれるので，Q1, Q2 はでの増加路である Pk は辺数最小の増加路なので，|E(Pk)|  |E(Q1)| かつ |E(Pk)|  |E(Q2)| 以上より |E(Pk)|  |E(Pk+1)| (Q1とQ2はG1の辺素パス) (G1は逆辺を削除)

(b) の証明： k < i < l である全ての i に対しPi  Pl が逆辺を持たないようなk, l に対して証明できれば, (a) より全ての k < l に対しても成り立つ Pk  Pl から逆辺対を全て除去したグラフを G1 とするであり，の任意の辺は Pk, Pk+1,...,Pl-1 のいずれかに含まれる辺の逆辺であるよって，Pl の辺で Pk の逆辺でないものは，の辺か Pk+1,...,Pl-1 の辺の逆辺である k と l の定め方より，Pl は Pk+1,...,Pl-1 の辺の逆辺を持たない．

よって，Pl の辺で Pk の逆辺でないものはの辺となり，が得られる
G1 には2つの辺素な s-t パス Q1, Q2 が存在し，それらはでの増加路である Pk は辺数最小の増加路なので，|E(Pk)|  |E(Q1)| かつ |E(Pk)|  |E(Q2)| 　 (少なくとも2辺を除去しているので) より，命題が成り立つ

定理：辺数 m, 点数 n のネットワークに対して， Edmonds-Karpアルゴリズムは辺の容量に関わらず，高々 mn/2 回の反復で終了する
証明：アルゴリズムの実行中に選ばれた増加路を P1, P2,... とする増加路で可能な限りフローを追加するので，各増加路には残余ネットワークでの容量いっぱいまで流す枝が存在する．それをボトルネック辺と呼ぶ任意の辺 e に対して，e をボトルネック辺にもつ増加パスをとするとの間に，e の逆辺を含む増加路が存在する (ij < k < ij+1)

補題 (b) より，全ての j に対し e が端点として s と t を含まなければ，全ての j でとなり，e をボトルネック辺としてもつ増加路は高々 n/4 個 e が端点として s と t のいずれかを含むときは， e またはその逆辺をボトルネック辺として持つ増加路は高々1つ任意の増加路は G の辺かその逆辺を少なくとも 1本含むので，高々 2m  n/4 = mn/2 個の増加路しか選ばれない

Dinicアルゴリズムフロー増加法の各反復において，1本のフロー増加路だけを用いてフローを追加するのではなく，複数のフロー増加路を求めてそれらに同時にフローを追加するアルゴリズム Edmonds-Karpアルゴリズムでは各反復で最短の増加路を求めているが，補題 (a) より路の長さは単調増加路の長さが等しい増加路の系列をアルゴリズムのフェイズと呼ぶ 1つのフェイズの全ての増加路を同時に求めて加える

命題：あるフェイズの始まりのフローを f とする．そのフェイズの増加路は全て Gf の増加路になる．
証明：補題 (b) より，ある増加路 Pk と Pl に対して， Pk  Pl が逆辺対を持つならば |E(Pk)| + 2  |E(Pl)| なので，同じフェイズに属する増加路の共通部分は逆辺対を持たないの任意の辺は Pk, Pk+1,...,Pl-1 のいずれかに含まれる辺の逆辺であるが，今は逆辺は存在しないので，となるつまり，あるフェイズの増加路は全て Gf の増加路

初期フロー残余ネットワーク 1 2 3 4 5 (0) 1 2 4 5 3 3 5 1 5 4 3 8 フロー残余ネットワーク 1 2 3 4 5 (0) (4) 1 2 4 5 3 3 5 1 4 5 4 3 4

フロー残余ネットワーク 1 2 3 4 5 (3) (4) (0) (7) 1 2 4 2 3 3 3 5 1 1 5 4 3 7 この増加路は1つ前の残余ネットワークにも存在 (長さ3の増加路全てで成立)

残余ネットワークから，路長最小の路の集合を求める必要がある．定義：ネットワークの s-t フロー f に対して，Gf のレベルグラフは
1 2 4 残余ネットワークフロー 5 3 1 2 3 4 5 (0) (4) 3 5 1 4 5 4 3 4 2 4 レベルグラフ 1 5 3

定義：ネットワークの s-t フロー f は，が s-t パスを含まないとき，ブロックフロー (blocking flow) と呼ぶ
残余ネットワークのレベルグラフ 1 2 3 4 5 1 2 3 4 5 3/5 3/3 3/4 s-t フローが存在しない ⇒ 左はブロックフロー 1/1 2 4 2 4 4/5 1/3 3/3 1 2 1 3/4 5 1 1 5 3 3

Dinicのアルゴリズム全ての e  E(G) で f(e) = 0 とする Gf のレベルグラフを作る
レベルグラフでのブロックフロー f’ を求める． f’ = 0 ならば終了 f を f’ だけ増加させる．2. へ行く注：f を増加させるときは，逆辺に対しては f(e) := f(e)  f’(e) とする

初期フロー残余ネットワーク 1 2 3 4 5 (0) 1 2 4 5 3 3 5 1 5 4 3 8 ブロックフローレベルグラフ 2 4 2 4 1 1 5 5 3 4/4 4/8 4 3 8

フロー残余ネットワーク 1 2 3 4 5 (0) (4) 1 2 4 5 3 3 5 1 4 5 4 3 4 ブロックフローレベルグラフ 1 2 4 1/1 2 4 5 3 4/5 1/3 3 3/3 1 4 5 1 3/4 5 3 3

フロー残余ネットワーク 1 2 3 4 5 (4) (3) (1) (0) (7) 1 2 4 1 2 3 5 4 1 1 1 5 4 3 7 レベルグラフ 2 4 1 1 5 3 レベルグラフに s-t フローが存在しないので終了フローの流量は 8

Dinicアルゴリズムの計算量辺数最小の増加路の辺数はフェイズごとに厳密に増加する．よってDinicアルゴリズムは高々 n1 回のフェイズ後終了する．各フェイズではブロックフローは O(mn) 時間で求まるレベルグラフは幅優先探索で O(m) 時間レベルグラフ上で1つのs-tパスは O(n) 時間 1つのs-tパス上には少なくとも1つの飽和辺が存在するそれを削除してさらにs-tパスを探す．高々 m 回で終了全体の計算量は O(mn2) 動的木というデータ構造を用いるとブロックフローは O(m log n) 時間．全体で O(mn log n) 時間．

An O(m log m)-time Algorithm for Detecting Superbubbles
Wing-Kin Sung, Abha Belorkar, Iana Pyrogova (NUS) Kunihiko Sadakane, Tetsuo Shibuya (U. Tokyo) Dec. 17, 2014

de Novo Assembly Assemble a genome from reads
cf. re-sequencing (i.e., mapping to the reference) Single Reads Paired Reads Assemble Genome (contigs / scaffolds)

Standard Assembly Strategy
Overlap-Layout-Consensus Approach Construct a graph that represents relationships between reads The overlap graph or the de Bruijn graph is often used Layout Decide the order of reads on the genome by using the graph Consensus Align reads following the order decided in the layout phase The consensus genome (contig) alignment

de Bruijn Graph The k-dim. de Bruijn graph G(V,E) for string T
V: set of all k-mers (length-k substrings) in T E: {u  v | u, v  V, u = c1s, v = sc2, i c1sc2 = T[i..i+k]} (s is (k-1)-mer) T = $$$TACGACGTCGACT$ TAC ACG G CGA CGT A T GAC GTC C ACT TCG $ $TA $$T $$$ k = 3

Graph Modification Unipath concatenation Error correction
We obtain smaller graph Error correction We must remove some subgraph structures caused by sequencing errors The following simple structures can be detected in linear time unipath Tip Bubble Crosslink

Real de Bruijn graphs are complicated...
Due to sequencing errors or real repeats This kind of structures cannot be detected by tips, bubbles, cross links cggcacaaaaa　　　 tatgaggaaaaacaggg aggatatg　　 att aaa agtt cagtttgtattttttgttgagtgaatgt ct ccag t c ata gagatgcaagtgtagatacacag ta aga tagatgcaagtgtagatacacag　ta aga gcag t c　　 cta　　　　 tagatgcaagtgtagatacacag　ta tagtttgtattttttgttgagtgaatgt tggcacaaaaa attg cggaaaaaacagggaggatatgatt　　　　ata agttcagttt a tgttttttgtt gagtgaatgtctccag cc　ata gagatgcaagtgtagatacacag t c g ggttttttgtt gagtgaatgtctccagt tgttttttgtt　gagtgaatgtctccag gggaaaaaacagggaggatatgatt ctg　gggaaaaaacaggg　aggatatg att ttt ata agttcagttt　　　g ggttttttgtt tatgaggaaaaacaggg 28 18 2 34 3 6 25 23 5 17 11 10 4 27 1 Superbubbles

Superbubbles [Onodera, Sadakane, Shibuya WABI2013]
Def. For distinct vertices s, t in a directed graph, the pair s, t is called a superbubble if it satisfies reachability t is reachable from s matching the set of vertices reachable from s without passing through t is equal to the set of vertices from which t is reachable without passing through s acyclicity minimality

Examples Superbubbles Not superbubbles s t cycle reachable from s
sink　t source s s t cycle reachable from s not reachable to t s s t t reachable to t not reachable from s

Our algorithm: worst case O(m log m) time
Existing Algorithm Onodera et al.’s algorithm [WABI13] applies topological sort from each vertex O(nm) time in the worst case (n: #vertices, m: #edges) O(m) time on average A worst case will happen if there exist nested superbubbles Our algorithm: worst case O(m log m) time

Observations Lemma 3.1: (p, c) is an edge where p has one child and c has one parent ⇒ p, c is a superbubble For any superbubble s, t, there must exist some parent p of t such that p has exactly one child t p p c t

Observations Lemma 3.2: Consider a vertex c that has more than one parent. Let p be a parent of c that has exactly one child. Then there is no superbubble of the form p, t. p c

Observations Lemma 3.3: Consider a vertex c in G that has more than one parent. Let G’ be a graph formed by merging p and c as c for all parents p of c that has exactly one child. Apart from the superbubbles that end at these p, G’ and G have the same set of superbubbles. p1 p2 c c G G’

Observations Lemma 3.4: Consider a directed acyclic graph G. Suppose a superbubble s, t is nested in another super bubble s’, t’. The topological order of t is smaller than that of t’. s’ s t’ t

O(m log m) time Algorithm for DAG
for each vertex t in topological order while t has at least two parents and some parent p of t has exactly one child for each parent p of t with exactly one child, merge p and t as t if t has exactly one parent p and p has exactly one child t report p, t merge p and t as t

3 p 5 t 1 2 4 7 9 6 8

3 t 1 2 4 7 9 6 p 8

3 t 1 2 4 7 9 p 8

3 t 1 2 7 9 p 8 2, 7 is a superbubble

3 p t 1 7 9 8

t p 1 7 9 8

t 1 9 8 p

p t 1 9 1, 9 is a superbubble

Analysis For each node v, we keep a list of its parents
The list is maintained using the AVL tree. Two lists of parents Pu and Pv can be merged in time The total merging time is bounded by the merging time for all lists, which is O(m log m)

Algorithm for General Graphs
partition the input graph H into subgraphs {G1,…, Gk} for each subgraph G  {G1,…, Gk} if G is acyclic find superbubbles in G else transform G into a DAG G’ find superbubbles in G’

Observations Lemma 4.1: Consider a superbubble s, t. Let U be the set of vertices appear in some paths from s to t. Let C be a non-singleton strongly connected component If U  C  , both s and t belong to C. For each non-singleton strongly connected component Ci, we create a subgraph Gi 3 5 s t r 1 2 4 7 9 r’ 6 8

Observations Let Gr denote the graph consisting of singleton SCCs and edges between them. Note: Gr is a DAG. Lemma 4.2: Let S be the set of superbubbles in H. Let SS be the set of superbubbles in G1,…, Gk, Gr which do not involve artificial vertices. Then S = SS. What remains is to find superbubbles in SCCs. We convert each Gi into a DAG.

Converting SCC to DAG G G’ Fix a DFS tree of G from r
For each node v, create v’ and v’’ For each back edge (u, v), create (u’, v’’) For other edges (u, v), create (u’, v’) and (u’’, v’’) Remove (r, v’’) and (u’, r’) 4’ r 2’ 3’ 5’ 4 r 2 3 5 r’ 4’’ G G’ 2’’ 3’’ 5’’ r’

Observations Let T be a DFS tree. Lemma 4.3: For any s, t  V(T) such that s is ancestor of t in T. s, t is a superbubble iff both s’, t’ and s’’, t’’ are superbubbles in G’. Lemma 4.6: For any s, t  V(T) such that t is ancestor of s in T. s, t is a superbubble iff s’, t’’ is a superbubbles in G’.

G G’ 3, 5 is a superbubble 3’, 5’ is a superbubble
4’ r 2’ 3’ 5’ 4 r 2 3 5 r’ 4’’ G G’ 2’’ 3’’ 5’’ r’ 3, 5 is a superbubble 3’, 5’ is a superbubble 3’’, 5’’ is a superbubble

G G’ 1, 4 is a superbubble 1’, 4’’ is a superbubble r 2’ 1’ 4’ 2
3’ 1 4 3 2’’ 1’’ 4’’ 1, 4 is a superbubble 3’’ r’ 1’, 4’’ is a superbubble

Conclusion We proposed a worst case O(m log m) time algorithm for computing superbubbles In practice, Onodera et al.’s algorithm is faster Open problem: linear time algorithm

算法数理工学第10回定兼　邦彦.

Similar presentations

Presentation on theme: "算法数理工学第10回定兼　邦彦."— Presentation transcript:

Similar presentations

About project

フィードバック

ログインする

Auth with social network:

算法数理工学 第10回 定兼 邦彦.

Similar presentations

Presentation on theme: "算法数理工学 第10回 定兼 邦彦."— Presentation transcript:

Similar presentations

About project

フィードバック

算法数理工学第10回定兼　邦彦.

Presentation on theme: "算法数理工学第10回定兼　邦彦."— Presentation transcript: