Presentation is loading. Please wait.

Presentation is loading. Please wait.

前回の練習問題 A B C D E F G H 確率 0.363 0.174 0.143 0.098 0.087 0.069 0.045 0.021 2元ハフマン符号を構成せよ 平均符号長を求めよ 0.363×1+0.174×3+...+0.021×5=2.660 1.000 1 0.637 answers.

Similar presentations


Presentation on theme: "前回の練習問題 A B C D E F G H 確率 0.363 0.174 0.143 0.098 0.087 0.069 0.045 0.021 2元ハフマン符号を構成せよ 平均符号長を求めよ 0.363×1+0.174×3+...+0.021×5=2.660 1.000 1 0.637 answers."— Presentation transcript:

1 前回の練習問題 A B C D E F G H 確率 0.363 0.174 0.143 0.098 0.087 0.069 0.045 0.021 2元ハフマン符号を構成せよ 平均符号長を求めよ 0.363× × ×5=2.660 1.000 1 0.637 answers of the quiz Q1) construct a Huffman code for the messages which are given in the table, where the numbers represent the probabilities of the messages occur. A1) the code tree in the slide is an example of Huffman code. Note that the Huffman code is not unique. You may have obtain a Huffman code which is different from the one given in the slide. Q2) compute the average codeword length for the constructed code. A2) bit per message. Even if your have constructed a Huffman code which is different from the above, the average codeword length must coincide with 0.359 0.278 0.185 0.135 0.066 0.363 A 0.174 B 0.143 C 0.098 D 0.087 E 0.069 F 0.045 G 0.021 H 100 110 1010 1011 1110 11110 11111

2 前回の練習問題 A B C D E F G H 確率 0.363 0.174 0.143 0.098 0.087 0.069 0.045 0.021 4元ハフマン符号を構成せよ 確率の小さな4つの木を併合すれば良い? ⇒ 最後に,節点の個数が足りなくなるかも... ⇒ 一回の併合で,4 – 1 = 3 個の木が消失 ⇒ 最後,1 個の木になって終了するには, 最初に 3k + 1 個の節点が必要 answers (cnt’d) Q3) construct a 4-ary Huffman code. A3) Basically, we are going to construct a code tree with degree four (a node can have at most four children). One point we need to remark is the number of trees which is considered during the execution of the Huffman algorithm. At the beginning, we have eight trees (eight nodes) as we have eight messages. Then we select four trees, adjoin them all, and the number of trees becomes five (= 8 – 4 + 1). In the next iteration, the number of trees becomes two (= 5 – 4 + 1). In the last iteration we have only two trees, which results in a code tree whose root node has only two children. Remark here that a the code tree obtained in this way is not good, because the code tree does not fully make use of “shallow node positions”. If a code tree has a leaf at a shallow position, then we will have a codeword with a short code length. It is not good strategy to leave shallow leaf positions unused. To avoid this issue, we add some “dummy nodes” before we start the Huffman algorithm. In this case, we add two dummy nodes so that the number of trees becomes 3k + 1 at the beginning. The number of trees reduce by three in each iteration of the Huffman algorith. At the final iteration, we have four trees at our hand, which allow us making a code tree whose “shallow positions” are fully occupied. ? 確率 0 のダミー節点を 2 個加え,通報数を10 = 3×3 +1に 1.000 d 0.320 a b c 0.066 0.363 A 0.174 B 0.143 C 0.098 D 0.087 E 0.069 F 0.045 G 0.021 H * * a b c da db dc dda ddb

3 本日の講義 ハフマン符号の性能と限界 ハフマン符号化の拡張 ブロック符号化と平均符号長 情報源符号化定理 情報源符号化の限界
非等長ブロックハフマン符号化 ランレングス符号化 today’s lecture -the performance of the Huffman code and its limit -extensions of the Huffman code - block coding contribute to reduce the average codeword length -Shannon’s source coding theorem - theoretical bound of the source coding (or, data compression) -block Huffman code with unequal block length -run-length Huffman code

4 平均符号長について 情報源符号化に求められる性質: 瞬時復号可能であること(自動的に一意復号可能となる) 平均符号長ができるだけ小さいこと
⇒ ハフマン符号が有力候補 瞬時復号可能な 符号のクラス 一意復号可能な ハフマン符号 a source code should - be immediately decodable (this implies that the code is uniquely decodable) - have small average codeword length A Huffman code seems a good code, but “how good” is it? to answer to this question, we would like to investigate a theoretical bound on the average codeword length. ハフマン符号は,どれくらい 「良い」符号なのか? ⇒ 平均符号長の限界定理

5 平均符号長の限界定理 定常情報源 S から発生する通報を一個ずつ, 瞬時復号可能な符号Cにより符号化することを考える
通報は M 通り,各通報の発生確率は p1, ..., pM [定理] 任意の符号について,平均符号長は必ず L  H1(S) となる ...どんなに効率的でも,平均符号長は1次エントロピー以上 平均符号長が L < H1(S) + 1となる符号を構成できる ...平均符号長が1次エントロピーに迫る符号を作ることが可能 we consider a theorem which gives the lower-bound limit of the average codeword length Assume that we have an information source S which has M symbols, and also assume that the i-th messages occurs with probability p_i. We would like to construct an immediately decodable code C by defining a codeword for each message symbol. Theorem: part 1: the average codeword length L of C is H_1(S) or more. part 2: there is a code whose code length L is less than H_1(S) + 1 (a proof will be given later) This theorem implies we cannot make the average codeword length smaller than the entropy. Interestingly, this lower bound is achievable. We can construct a code whose code length L is smaller than the entropy plus one.

6 シャノンの補助定理 定理の証明にあたり,補助定理(補題)をひとつ導入 (シャノンの補助定理,Shannon’s lemma)
[補題] q qM  1 を満たす任意の正数 q1, ..., qM に対し, To give a proof of the previous theorem, we need one lemma which is named after Shannon. In the lemman we use probabilities p_1, ..., p_M which are probabilities of messages of S. Lemma: for any positive numbers q_1,.., q_M with q_ q_M <= 1, the inequality given in the slide holds. The inequality holds by equation if and only if p_i = q_i. 等号成立は,p1 = q1, ..., pM = qM のとき,かつそのときのみ.

7 シャノンの補助定理の証明 証明:不等式の左辺 – 右辺を考えると, y = – logex 1 O y = 1 – x
証明:不等式の左辺 – 右辺を考えると, proof of the Shannon’s lemma We would like to show the inequality which is given at the top of this slide. Consider (lhs – rhs) of the inequality and translate the base of the logarithm. The log-value is lower-bounded as illustrated in the graph, and we finally see that (lhs – rhs) >= 0. The inequality holds by equality if and only if q_1/p_1 = 1. y = – logex 1 O y = 1 – x qi / pi = 1 (i = 1, ..., M) のとき等号成立

8 符号長限界定理の証明(前半) Cの符号語長をl1, ..., lM とし,qi = 2–liとおく.クラフトの不等式より
li = – log2qi なので,平均符号長は We come back to the theorem in the slide 5, and the part 1 is considered first. Assume that the code lengths of codewords of C are l_1, ..., l_M, and define qi = 2^{-l_i}. Use the Kraft’s inequality, and we have q_ q_M <= 1. The average codeword length L is represented as in the slide, and now we use the Shannon’s lemma, The right hand side of the inequality equals to the entropy, and the part 1 has been shown. シャノンの補助定理を用いて, 等号成立は,p1 = q1, ..., pM = qM のとき,かつそのときのみ (前半証明終了)

9 符号長限界定理の証明(後半) 以下の不等式を満たすよう,整数 li を定める. これより 2–li  2log pi =pi であり,
part 2 Define positive integers l_i so that it satisfies the first inequality. This makes 2^{-l_i} <= p_i, and l_i satisfies the Kraft’s inequality. We can construct an immediately decodable code with code length l_1, ..., l-M, and its average codeword length is upper-bounded by the last inequality. したがって l1, ..., lM はクラフトの不等式を満足し, この符号長を持つ(瞬時に復号可能な)符号を構成可能. この符号の平均符号長は (後半証明終了)

10 符号長限界定理とハフマン符号 [定理](再掲) 任意の符号について,平均符号長は必ず L  H1(S) となる
...実際には,もっと強い証明が可能 ハフマン符号よりも平均符号長の小さな瞬時符号は存在しない (ハフマン符号はコンパクト符号 (compact code)である) 証明は,符号木の大きさに関する帰納法による(以下略). Theorem (slide p.5): part 1: the average codeword length L of C is H_1(S) or more. part 2: there is a code whose code length L is less than H_1(S) + 1 The Huffman code always satisfies the part 2 of the theorem, and, more than that! There is no immediately decodable codes whose average codeword length is shorter than that of the Huffman code. For this characteristics, a Huffman code is said to be a compact code. The proof is by induction on the size of the code tree, but omitted here.

11 ハフマン符号とブロック化 ハフマン符号...一個の通報を,一個の符号語に変換する 平均符号長は,かならず1以上となる
2元情報源の符号化には向かない A B 10 C 11 通報 A B 平均符号長 確率 0.8 0.2 C1 1 1.0 C2 Huffman code and “blocks” Basically, we have considered to encode each message “symbol”. Because the length of a codeword is one or more, the average codeword length is one or more, obviously. This suggests that the Huffman code does not contribute to represent binary messages in a compact manner, as illustrated in the table. We may have two choices of the codes, C_1 or C_2, but they both do not reduce the average codeword length. Here we considet to encode “several symbols” instead of encoding a “single symbol”. For example, we consider to assign one codeword to a block of messages “ABA”. This may give us chance to reduce the average codeword length, and chance to “compress” binary data. 複数の通報をまとめて符号化(ブロック符号化)すると... 通報あたりの平均符号長を1以下にできるかも 2元情報源の符号化にも利用可能 A B A C C A 10 1101 01

12 ブロック符号化のイメージ 通報の系列 ABCBCBBCAA... 等長ブロック化 非等長ブロック化 ブロック詳細化法 ランレングス法
ブロック化された 通報の系列 AB CBC BB CAA... This slide sketches the process of the operations. We first separate a sequence of messages into several blocks. There can be many algorithms to separate the sequence into blocks, but the algorithms can be classified into two types; equal-length or unequal-length. The unequal-length algorithms have variations; block-partition scheme, run-length scheme, and so on. Anyway, we will have a sequence of blocks of messages. If we know the statistics of the original information source, then we can compute the statistics of this blocked information source. Use the statistics and we can construct a Huffman code which is optimized for this blocked information source. ハフマン符号化 符号語系列

13 等長ブロック符号化の例(1) A B C 確率 0.6 0.3 0.1 符号語 10 11 左表のとおりハフマン符号化をすると,
10 11 左表のとおりハフマン符号化をすると, 平均符号長は 0.6× × ×2 = 1.4 長さ2のブロックを考える AAの発生確率 = 0.6×0.6= 左表のとおりハフマン符号化をすると, 平均符号長は 0.36× × ×6 = 2.67 通報一個当たりでは,2.67 / 2 = 1.335 ⇒ 少し効率が良くなった AA AB AC BA BB BC CA CB CC 確率 0.36 0.18 0.06 0.09 0.03 0.01 符号語 100 1100 101 1110 11110 1101 111110 111111 here is an example of the block Huffman decoding The first table shows the usual case in which we decode messages one by one. In this case, the average codeword length is 1.4 bits per message symbol. Next we consider a block of length two. We consider, for example, “AA” as a single message. The probability of “AA” is 0.36 = 0.6 x 0.6, assuming that the information source is memory-less. In a similar way, we can compute the probabilities of other block patterns. For these nine messages (AA-CC), we can construct a Huffman code. The average codeword length of the code is 2.67, but this is for two message symbols. Divide this value by two, and we can see that bits are needed to represent a message symbol. Note that we could reduce the average codeword length slightly.

14 等長ブロック符号化の例(2-1) A B 確率 0.8 0.2 符号語 1 平均符号長は 0.8×1 + 0. 2×1 = 1.0
1 平均符号長は 0.8× ×1 = 1.0 長さ2のブロックを考える AAの発生確率 = 0.8×0.8= 平均符号長は 0.64× × ×3 = 1.56 通報一個当たりでは,1.56 / 2 = 0.78 ⇒ 2元情報源でも,効率改善 AA AB BA BB 確率 0.64 0.16 0.04 符号語 10 110 111 another example. this is a binary information source. If we don’t block the messages, then the average codeword length is one as we have seen in previous slides. If we consider blocks of length two, then the average codeword length is reduced to 0.78. We will be able to “compress” binary data, even if the output alphabet is again binary.

15 等長ブロック符号化の例(2-2) AAA AAB ABA ABB BAA BAB BBA BBB 確率 0.512 0.128 0.032
0.008 符号語 100 101 11100 110 11101 11110 11111 長さ3のブロックを考える AAAの発生確率 = 0.83= 平均符号長は 0.512× ×5 = 2.184 通報一個当たりでは,2.184 / 3 = 0.728 continued from the previous slide, and we consider blocks of length three. If we construct a Huffman code for this information source, then the average codeword length per message symbol is We can again reduce the length. From this, we can conjecture that using longer block length contributes to reduce the average codeword length. ブロック長 1 2 3 : 一通報あたり平均符号長 1.0 0.78 0.728 ブロック長を大きくすると, 一通報あたり平均符号長は 小さくなる(効率が良くなる)

16 ブロック符号の平均符号長 ブロック長を大きくすると,一通報あたり平均符号長は小さくなる ⇒ どこまで小さくなるのか?
n 個単位でブロック化した通報=n 次拡大情報源Snの通報1個 ⇒ 講義前半で紹介した平均符号長の限界定理が適用できる ブロック符号の平均符号長をLnとすると,H1(Sn)  Ln < H1(Sn)+1. 一通報当たりの平均符号長に換算すると... Then we come to a natural question. Is there any bound on the average codeword length? In the previous slides, we consider to encode blocks of messages (of S) where each block has length n. In this case, a block can be regarded as one message generated from the n-th order extended information source S^n. This is to say, we are defining a codeword for each message of S^n, and the theorem in p.5 of this slide can be used in this context. Let L_n be the average codeword length of the Huffman code which are defined for blocks of length n. The average codeword length per message symbol is therefore L_n / n, which is upper and lower-bounded by the inequality which is given at the bottomo of this slide.

17 シャノンの情報源符号化定理 H1(Sn) / n は,情報源 S の n 次拡大エントロピー Hn(S) n → ∞のとき,
H1(Sn) / n → H(S)...情報源 S の極限エントロピー 1 / n → 0 [シャノンの情報源符号化定理] 情報源Sから発生する通報は,一通報あたりの平均符号長が H(S)  L < H(S) + e である瞬時復号可能な符号に符号化可能 When n goes to infinity, H_1(S^n) / n gives the entropy H(S) of S, and 1 / n approaches to zero. This is summarized in this famous Shannon’s source coding theorem. Theorem: There is an immediately decodable code which can encode outputs S and whose average codeword length (per message symbol) is between H(S) and H(S) + e where e is an arbitrary small positive value.

18 情報源符号化定理の意味するところ 情報源Sから発生する通報は,一通報あたりの平均符号長が
H(S)  L < H(S) + e である瞬時復号可能な符号に符号化可能 ブロック長を長くしてハフマン符号を使えば,効率的にはベスト どれだけ頑張っても,極限エントロピーの壁は越えられない Theorem: There is an immediately decodable code which can encode outputs S and whose average codeword length (per message symbol) is between H(S) and H(S) + e where e is an arbitrary small positive value. What does this mean? Theoretically saying, we will have a better code if we consider longer blocks and use the Huffman algorithm. However, we cannot go beyond the “great-wall” of the entropy. A B 確率 0.8 0.2 符号語 1 ブロック長 1 2 3 : 一通報あたり平均符号長 1.0 0.78 0.728 : e H(S) = 0.723

19 ブロック符号化法の実用性 効率面からは,ブロック化+ハフマン符号が最良な方法 ブロック長を大きくすれば,いくらでも理論限界値に近づける
実用面から見たデメリット: 通報の発生確率を,あらかじめ知っておく必要がある (⇒ 次回講義で触れる予定) 符号の定義(通報と符号語の対応表)が巨大になる 対応表1行を記憶するのに1バイト必要だとすると... ブロック長 バイト ブロック長 キロバイト ブロック長 ギガバイト We can go close to the theoretical bound by using the Huffman code for blocked information source. However, this is a theoretical talk. In practice, there are some problems in using long blocks. - we need to know the probabilities of messages which are generated from the information source. this issue will be discussed in the next class. - another problem especially serious in using “long blocks” is that, the definition of the code can be too big. Encoders and decoders need to remember the relation between messages and codewords. If use a table representation to define the code, as we have used in previous slide, then the size of the table becomes so large. If, for example, we need one byte to record one entry of the table, then the size of the table is estimated as; 256 bytes, if the block length is 8 64 Kbytes, if the block length is 16 4 Gbytes, if the block length is 32 The size of the table affects the size of the implementation such as the size of circuits and programs. In practice, it seems difficult to obtain good codes by using not-a-short block length.

20 非等長ブロック符号化 ここまでのブロック符号化:符号化対象のパターンは同じ長さ ⇒ 発生確率の非常に小さなパターンにも符号語定義が必要
⇒ 対応表の巨大化 符号化対象のパターンが,異なる長さを持つことを許す ⇒ 非等長 (unequal-length) ブロック化.ただし 各パターンの発生確率が,ある程度均衡すること 任意の通報系列が,パターンにブロック化できること ...どのようにパターンを定義するかが鍵となる A B A A B A B A The table size grows unnecessarily big because we need to define codewords for blocks which will not occur frequently. This is unavoidable as far as we consider all blocks which have a certain block length. The table needs to contain “useless” entries. We can get around the problem by allowing blocks to have different block length, and consider unequal-length block. In defining the set of blocks, we need to care that - blocks in the set have as close probabilities as possible - an arbitrary sequence can be represented by using the blocks in the set. A B A A B A B A

21 パターンの定義方法について パターン定義の戦略1:ブロック詳細化(頻出パターンの細分化) 長さ1のパターンを用意する
発生頻度の高いパターンを細分化し,分割する 所望のパターン数になるまで 2 を繰り返す 例:P(A) = 0.8, P(B) = 0.2,所望パターン数4のとき how can we define the set of blocks? a block partitioning is one of possible strategies. In this strategy, we partition block whose probability is bigger than other blocks. step 1: we start from block patterns of length one step 2: select a block pattern whose probability is large, and partition the block by appending one symbol at the end of the pattern. This introduces several patters with smaller probabilities. step 3: repeat the step 2 if you can allow increasing the size of the table The example sketches the case in which we are going to construt a code table of size four. We can check that any sequence of “A” and “B” can be “blocked” by using block patterns given in the table. A B 0.8 0.2 AA AB B 0.64 0.16 0.2 AAA AAB AB B 0.512 0.128 0.16 0.2 任意の系列を,{AAA, AAB, AB, B} にブロック化可能

22 非等長ブロック化時の平均符号長 パターン AAA AAB AB B 確率 0.512 0.128 0.16 0.2 符号語 100 101
100 101 11 前スライドのパターンに対し, ハフマン符号を定義(右表) n個の通報ブロック... 符号語系列の長さは 0.512n× n× n× n×2 = 1.776n 通報系列の長さは 0.512n× n× n× n×1 = 2.44n 一通報あたりの符号長は,1.776n / 2.44n = 0.728 ...ブロック長3(対応表8行)のときと同程度の効率 (cf. p.15) does the unequal-length block contribute to realize short codes? to answer to this question, we need to investigate the average codeword length, but it might be slightly complicated compared to the equal-length case. Consider the example given in the table. Read the table so that we assign a codeword “100” to the pattern “AAB” which occurs with probability Assume that the information source generates many message symbols, and the sequence of message symbols are partitioned into n blocks. One block is encoded to one codeword, and we will have n codewords. Because block patterns occur with the probabilities given in the table, we can expect that 0.512n out of n codewords are “0”, a codeword with length one. This observation allows us estimating the expected number of symbols after the encoding operation; the length of the sequence of codewords amounts to 1.776n bits in total. This is the output of the encoder. Consider the input-side of the encoder. The encode receives n blocks, and each block will occur with the probability in the table. Similar to the output-case, we can estimate how many symbols are provided to the encoder. In summary, the encoder receives 2.44n symbols and outputs 1.776n symbols. So we are useing 1.776n / 2.44n = bits to represent one message symbol. This is almost the same performance as the code in p.15, in which the size of the table double to this unequal-case.

23 パターンの定義方法について:ラン長の利用
パターン定義の戦略2:記号のランによりパターン定義 (ランレングス法) 記号のラン(run):系列中の,同一記号からなる部分系列 A B B A A A A A B A A A B another strategy to define good block patterns is to make use of “runs”. encoding based on this strategy is sometimes referred as “run-length” encoding. A run is a sequence of symbols consists of a single type of character. Consider the string given at the middle of slide, which has many A’s but small number of B’s. The first two characters “AB” can be regarded as a run of “A” of length one, because one “A” is followed by “B”. The “B” at the third position can be regarded as a run of “A” of length zero, because we can think of it as zero “A” followed by “B”. If the length of each run is given, then we can reconstruct the original sequence. So we consider to encode the length of runs. 長さ1のAのラン 長さ3のAのラン 長さ5のAのラン 長さ0のAのラン 特定記号のランの長さだけ与えられれば,元の系列を復元可能 ⇒ ランの長さを符号化しよう!

24 ラン長の上限 非常に長いランが発生することも... ⇒ランの長さに上限を設け,上限以上のランは短く分割して表現 上限を3とした場合 ラン長
1 2 3 4 5 6 7 : 表現方法 3+0 3+1 3+3+0 3+3+1 ABBAAAAABAAAB を表現する: Aが1個発生し,Bが発生 Aが0個発生し,Bが発生 Aが3個以上発生 Aが2個発生し,Bが発生 one small remark is that the length of run can be very long, and it is not practical to define a codeword to those very long runs, because they will not be used so often. To get around this problem, we consider to represent a long run as a concatenation of short runs. The table illustrates the case that we use three as an upper bound limit. In this example, we consider four patterns of runs of “A”; a run of “A” of length zero (0) a run of “A” of length one (1) a run of “A” of length two, (2), and “AAA” but the character next to the last “A” is not read yet (3+) any string can be represented as a concatenation of such runs of “A”.

25 ランレングスハフマン符号 ランレングスハフマン符号 (run-length Huffman code)
通報系列中の特定記号のラン長を符号化したハフマン符号 ⇒ 通報の発生頻度に偏りがある場合,非常に有効 p(A) = 0.9, p(B) = 0.1 のとき ラン長 1 2 3以上 符号化されるパターン B AB AAB AAA 発生確率 0.1 0.09 0.081 0.729 符号語 10 110 111 run-length Huffman code is a Huffman code which is defined for the runs considered in the previous slide. From a different viewpoint, each run considered in the previous slide can be regarded as a block pattern, and we can say that we are considering one implementation of unequal-length block coding. The run-length Huffman code is especially effective when there is a big bias in the probability of messages. Consider the case that “A” occurs with probability of so much of 0.9. We can compute the probability of each run, and then construct a Huffman code. The bottom of the slide shows some encoding examples using this code. ABBAAAAABAAAB: 1, 0, 3+, 2, 3+, 0 ⇒ AAAABAAAAABAAB: 3+, 1, 3+, 2, 2 ⇒ AAABAAAAAAAAB: 3+, 0, 3+, 3+, 2 ⇒

26 符号化例 S: 記憶のない2元定常情報源,p(A) = 0.9, p(B) = 0.1
エントロピーは H(S) = –0.9log20.9 – 0.1log20.1 = 0.469 単純なハフマン符号化: 平均符号長は1 等長ハフマンブロック符号化 (n = 3) 通報 A B 確率 0.9 0.1 符号語 1 to summarize today’s class, we consider several encoding techniques. We assume that A and B occur with probabilities 0.9 and 0.1, respectively. The entropy of this information source is If we use the simplest Huffman code, then the average codeword length is 1. If we use equal-length block Huffman coding, then we need to manage a big code table but the average codeword length can be reduced to 0.55. 通報 AAA AAB ABA ABB 確率 0.729 0.081 0.009 符号語 100 110 1010 通報 BAA BAB BBA BBB 確率 0.081 0.009 符号語 1110 1011 11110 11111 平均符号長 1.661, 一通報あたりの平均符号長 / 3 = 0.55

27 符号化例(続) ランレングスハフマン符号化(符号語数を8に設定) ラン長 1 2 3 確率 0.1 0.09 0.081 0.073 符号語
1 2 3 確率 0.1 0.09 0.081 0.073 符号語 110 1000 1001 1010 ラン長 4 5 6 7+ 確率 0.066 0.059 0.053 0.478 符号語 1011 1110 1111 next consider to use the run-length Huffman code. We allow runs of length 0 to 6, and 7+. This will become unequal-length block coding, and we need to use the technique given in p.22 of this slide. (small remark: a run of length k has (k + 1) symbols, because it contains k “A”’s and one “B” which follows “A”’s). In this case, the average codeword length is 0.47, slightly better than the equal-length block case, with the same size of code tables. nブロックでは... 符号語系列 0.1n× n×1=2.466n 通報系列 0.1n× × ×7=5.216n (ラン長 k ⇒ k 個のAと一個のB なので通報数では k+1) 一通報あたりの平均符号長は 2.466n / 5.216n = 0.47

28 本日のまとめ 平均符号長の限界定理 シャノンの情報源符号化定理 ハフマン符号の拡張的利用法 ブロック符号化 非等長ブロック符号化
ブロック詳細化法 ランレングス法 (ブロック詳細化の特殊ケース,と考えることも可能) summary of today’s lecture theorem related to the average codeword length (p.5) Shannon’s source coding theorem (p.17) extended usage of the Huffman codes - block encoding (equal-length) - unequal-length block encoding - block partitioning strategy - run-length encoding (this can be regarded as a special case of the block partitioning).

29 練習問題 通報とその発生確率を入力すれば,ハフマン符号を構成するプログラムを作成せよ ブロック符号が構成できるよう,上記プログラムを改造せよ
適当な通報と発生確率を定め,一通報あたりの平均符号長がブロック長によりどう変化するか,プログラム実験を行え suggested practice write a computer program which determine a Huffman code for a given message statistics. modify the above program so that it can deal with (equal-length) block encoding. use the above modified program and observe the relation between the average codeword length and the block size.


Download ppt "前回の練習問題 A B C D E F G H 確率 0.363 0.174 0.143 0.098 0.087 0.069 0.045 0.021 2元ハフマン符号を構成せよ 平均符号長を求めよ 0.363×1+0.174×3+...+0.021×5=2.660 1.000 1 0.637 answers."

Similar presentations


Ads by Google