前回の練習問題 通報とその発生確率を入力すれば,ハフマン符号を構成するプログラムを作成せよ ブロック符号が構成できるよう,上記プログラムを改造せよ 適当な通報と発生確率を定め,一通報あたりの平均符号長がブロック長によりどう変化するか,プログラム実験を行え 回答例は省略:「優先度付きキュー」の典型的な応用例 「優先度付データを登録」「優先度最大データを取り出す」 木を「データ」,確率を「優先度」とする 「木を二つ取出し,併合して一つにして再登録」を繰返す 残った木にラベル付けして完成 exercise in the last class write a computer program which determine a Huffman code for a given message statistics. modify the above program so that it can deal with (equal-length) block encoding. use the above modified program and observe the relation between the average codeword length and the block size. sorry, no sample answers. the program will be a typical example of the usage of the data structure which is called a “priority queue”. - Two operations are defined for the data structure; “enqueue a data with a certain priority in the queue” and “dequeue the data with the highest priority” - let tree be the data and the probability as the priority (smaller probability means higher priority) - dequeue two trees from the queue, merge them, and enqueue the merged tree - whenonly one tree remains in the queue, give labels to edges of the tree
前回の復習 シャノンの情報源符号化定理 一記号あたりの平均符号長は,必ずH1(S)以上 ブロック符号化 通報を複数まとめてブロック化することで,効率が向上 効率 = 一記号あたりの平均符号長...圧縮率に相当 review of the last class -Shannon’s source coding theorem - the average codeword length per symbol is H_1(S) or more - we can construct a code whose average codeword length is smaller than H_1(S) + 1 -block coding - block coding contributes to improve the compactness - the compactness is measures by the average codeword length per symbol (it can be regarded as the compression rate).
前回の補足(1-1) ブロック化すると,どうして効率が良くなるか? 理想的な符号語長は実数値となる P(A)=0.8, P(B)=0.2 の場合,理想的な符号語の長さは... supplement comments on the previous class (1-1) Why block coding improves the compactness? - to make the average codeword length equals to the theoretical bound of H_1(S), we need to define an “ideal code” whose codeword length are real numbers. for example, if A and B occur with probabilities 0.8 and 0.2, respectively, then the codeword length l_1 and l_2 should be chosen to satisfy the two restriction. This gives l_1 and l_2 to have values 0.322 and 2.322. - in a real world, however, the length of a codeword must be an integer. We cannot define a codeword whose lenght is 0.322, and we need to define a codeword whose length is 1 for the symbol A. We are using an unnecessarily long codeword, which degrade the average codeword length. For the symbol B, we may use a codeword whose length is 2.322, but we have an unused shorter codeword with length one. This can improve the average codeword length, but unfortunately, A and B do not occur equally likely. We have more chance to use the “unnecessarily longer codeword”, and less chance to use the “shorter codeword”. In total, the average codeword length degrades, unfortunately. 現実には,符号語は記号系列... 符号語長は整数値しかダメ 理想的な値と現実の値とのあいだにギャップが存在 0.322ビット ⇒ 1 ビット:効率は「悪化する」 2.322ビット ⇒ 1 ビット:効率は「改善する」 頻繁に「悪化する」,ときどき「改善する」
前回の補足(1-2) ブロック化すると,どうして効率が良くなるか? 確率 pi の通報の,平均符号長への寄与を考えると... 理想では pi (–log pi) 現実には pi [–log pi] ([log pi] は log piの近似正整数) 理想と現実のギャップは | pi(log pi – [log pi]) |(下図) ブロック長 n → 大 ⇒ 全般的に pi →小 ⇒ 理想と現実のギャップ→小 supplement comments on the previous class (1-2) Again, why block coding improves the compactness? consider how much contribution a message with probability p_i makes to the average codeword length. In the “ideal code”, the contribution of the corresponding codeword to the average codeword length is p_i (-log p_i), but the actual code length has an integer code length and the actual contribution is p_i [-log p_i] where [ ] denotes a close integer. The difference between the “ideal” and “actual” contribution is the difference of the two values. The graph below shows the relation between the difference and the probability. Roughly saying, the gap between the idel avlue and the actual value become large as p_i goes large. If p_i is small, then the restriction of “integer codeword length” does not spoil the average codeword length so much. Using a long block means that the probability of each message go small, and the gap become smaller in total. 理想と現実のギャップ pi 1.0
前回の補足(2) ブロック符号化の運用について 有限長の通報系列は,末尾部分でブロック化できない場合も 長さ10の等長ブロック化 ⇒ 25個の通報だと,5つの通報が端数となる ランレングス符号化 ⇒ 最後のランを終端する記号がない場合も 通常は,運用で問題を回避 最終ブロックだけ別扱い パディング(埋め草,詰物)の利用 + 通報数の伝達 supplement comments on the previous class (1-2) about the block coding, we may need a certain trick to deal with the last block, because it can happen that the last block cannot be encoded correctly we implement the system naively. for example... - we define a block by grouping 10 messages but the total number messages we have is 25. In this case, we cannot encode the last 5 messages because they are too short to constitute a block - in the run-length coding, there can happen that the last run is not terminated... These kind of problems can be get around by using some tricks. - the last block is encoded by dusing a different rule, or - give a “padding” data so that the remaining messages can constitute a block
本日の講義 ハフマン符号以外の符号化法について考える 装置化が簡単で効率が良い符号化法 算術符号化法 通報の発生確率が事前にわからない場合の符号化法 Lempel-Ziv法 許容範囲内での情報劣化と引き換えに,効率を稼ぐ符号化法 画像や音声の符号化法について ある程度独立した3つの話題の紹介 today’s class source coding other than the Huffman code 1) arithmetic codes; a coding scheme which is advantageous for the implementation 2) Lempel-Ziv code; a coding in which we don’t have to know the probability of the messages in advance 3) lossy-coding; a coding which allow that the original information is lost in some extent You can consider that they are rather independent three topics.
算術符号化法 装置化が簡単で効率が良い符号化法 arithmetic codes a coding scheme which is advantageous for the implementation
ハフマン符号化と装置化の容易さについて ハフマン符号:通報と符号語とを一対一に対応付け ⇒ 符号化器,復号器の内部で,対応表を記録する必要アリ ⇒ 効率をよくするためには,対応表も大きくなりがち 算術符号(arithmetic code): 対応表を使わず,on-the-flyで符号化・復号を行う方式 ⇒ よりコンパクトな実装が可能となる 通報(またはブロック)を一個ずつ符号化するのではなく, 通報系列全体を,ひとつの符号化系列に変換する (「通報系列全体で1ブロック」という考え方も...) Huffman codes and its implementation Huffman code defines a relation between messages and codewords; this means that the encoder and the decoder needs to have a certain table which defined the relation, and the table size grows rapidly if we try to make the code more efficient (more compact). Arithmetic codes can get around the problem. We don’t have to use the codeword table; the encoding and the decoding can be used on-the-fly without using the codeword table, and this feature allow us implement the encoder/decoder in compactly. To understand the arithmetic codes, forget the idea that we encode a message into a codeword. Now we are going to encode the entire sequence of messages into one real number.
算術符号:基本的な考え方 記憶のない2元定常情報源: P(A) = p, P(B) = 1 – p 長さ n の通報系列の符号化を考える ⇒ 2n 通りの通報系列 p = 0.7, n = 3 の場合 8通りの系列 w0,..., w7 辞書式に並んでいること 系列発生確率 S(wi)を求め, 累積発生確率 L(wi)を計算する 番号 1 2 3 4 5 6 7 wi AAA AAB ABA ABB BAA BAB BBA BBB S(wi) 0.343 0.147 0.063 0.027 L(wi) 0.343 0.490 0.637 0.700 0.847 0.910 0.973 For simplicity, we consider a binary case with A and B occur with probabilities p and 1-p, respectively. Also assume that we are going to encode a sequence with n messages (the number of possible message sequence is 2^n in total). If p = 0.7 and n = 3, we have 2^3 = 8 sequences which can occur. Now order the sequences, for example in the dictionary order, and name them as w_0, ..., w_7. We write S(w_i) for the probability that w_i occurs, and write L(w_i) for the probability that w_j with j < i occur. Thus L(w_i) is the sum of S(w_j) with j < i, and we have L(w_i) = L(w_{i-1}) + S(w_{i-1}). ↑ 発生確率 ↑ 累積確率...この系列より前の系列の発生確率
アイデア:系列と区間 各系列に対し,[0,1]の部分区間を対応付け可能 系列 wj ... 区間 Ij=[L(wj), L(wj+1)) 0.343 0.147 0.063 0.5 1.0 0.027 AAA AAB ABA ABB BAA BAB BBB 番号 1 2 3 4 5 6 7 wi AAA AAB ABA ABB BAA BAB BBA BBB S(wi) 0.343 0.147 0.063 0.027 L(wi) 0.490 0.637 0.700 0.847 0.910 0.973 The table in the previous slide can be represented as in this figure. The eight sequences defines a partition of the section between 0 and 1. Each subsection corresponds to one of the eight sequence, and the size of the subsection represents the probability that the sequence occurs. In a sense, the sequence w_j occupies the section I_j = [ L(w_j), L(w_{j+1}) ). Our idea is; choose a point x in I_j to say that w_j has occurred. We are told that x has been chosen, then we can say that which sequence w_j has occurred. To implement this idea, two problems must be addressed; - how can we associate the x and I_j? - how do we represent the value of x? 系列 wj ... 区間 Ij=[L(wj), L(wj+1)) 系列 wj を,区間 Ij内の代表点 x Ij (実数値)により表現する 区間と x の対応関係をどう計算? 実数値 x をどう表現? 区間のサイズ 区間の左端
区間の木表現と累積確率計算 系列と実数値の対応付け:二方向への変換が必要 [符号化] 系列から,対応する区間を求める [復号] 実数値から,対応する系列を求める ⇒ どちらも、区間の木構造表現を利用して計算すれば良い 0.343 0.147 0.063 0.027 AAA AAB ABA ABB BAA BAB BBB AA AB BA A B this slides says that the partition of the section can be represented by using a certain tree struture. See the figure in the previous slide, and you can see that sequences of messages which start from “A” occupy the left half of the section, and those sequence starting from “B” occupy the right half. Each subsection can be further divided according to the second message symbol and so on. By making use of this structure, we can... - determine the subsection which corresponds to a given sequence of messages, and - determine the sequence of messages which corresponds to a given subsection the above two computations are essential in encoding and decoding, respectively, and the procedure will be observed in the following slides.
系列から,対応する区間を求める 根節点を定義し,S() = 1, L() = 0とする S(wB) = S(w)(1 – p) L(wB) = L(w) + S(w)p S(wA) = S(w)p L(wA) = L(w) w S(w), L(w) wA wB P(A) = p... ABBの区間は? - determine the subsection which corresponds to a given sequence of messages traverse the tree on-the-fly with computing values of S( ) and L( ). At the beginning, we are at the root node epsilon (e), with S(e) = 1 and L(e) = 0 (intuitively, the size of the considered section is S(e) = 1 and the left-end of the section is L(e) = 0, meaning that we have not yet partitioned the section [0, 1]). The S( ) and L( ) values of the subsequence nodes are computed when they are needed. The equations in the gray box show the rule for the computation. Again remind that S( ) is the size of the investigated section and L( ) is the left-end of the section. The tree in the bottom shows the computation example for the sequence ABB/ S() = 1 L() = 0 S(A)=0.7 L(A) = 0 A B L(B) = 0.7 AA AB S(AB)=0.21 L(AB) = 0.49 ABB の区間は 0.637~0.700 ABA ABB S(ABB)=0.063 L(ABB) = 0.637 着目節点の L( ) 値 L( ) 値 + S( ) 値
実数値から,対応する系列を求める 前スライドと同じく,S( ), L( ) の値を再帰計算 目標値 x に応じて,左に進むか右に進むか二者択一 x が L(wB)以下 ⇒ 左の子へ x が L(wB)より大 ⇒ 右の子へ 対応関係の計算に, 対応表は不要 - determine the sequence of messages which corresponds to a given subsection In the previous slide, we considered to traverse the tree with looking at a sequence of messages. The procedure can be regarded as an encoding operation. In the decoding, we are provided with a value x, and we need to determine which sequence of messages has the section which contains x. For this sake, we again traverse the tree according to the value of x. At the beginning, we are at the root node, and we know S(e) and L(e) as in the previous slide. Assume that we are now visiting a node which corresponds to a sequence w. If x is L(wB) or less, then we visit the left child of the node, and if not, we visit the right child of the node. The figure shows that we can determine that x = 0.600 belongs to the section which corresponds to ABA. At the root node, we compare x = 0.600 and L(eB) = L(B) = 0.7 (we are writing e for the null sequence, and eB = B). Because x <L(B), we go to the left child which corresponds to the sequence “A” (this is a sequence of length one). The computation goes in this way, without using any table, but using arithmetic operations. 0.600に対応する系列は? S() = 1 L() = 0 L(B) = 0.7 S(A)=0.7 L(A) = 0 A B AA AB S(AB)=0.21 L(AB) = 0.49 ABA ABB S(ABA)=0.147 L(ABA) = 0.49 L(ABB) = 0.637 0.600は系列 ABA に対応
実数値 x の表現について 実数値 x の(2進数)表現が,実質的な符号語となる x の表現長は,短ければ短いほど良い 該当する区間内で,表現長が最小のものを選びたい L(wj+1) の小数点以下 – log2S(wj) ビットをxにする We next consider the representation of x. In the encoding procedure, we determine the section which corresponds to the sequence of messages which are to be encoded. However, the section contains infinitely many points. Which points should be chosen as x? Of course, we would like to choose x so that the representation of x is as short as possible. We would like to choose x which has the smallest representation (in binary) within the section. For this sake, we can choose \lceil –log S(w_j) \rceil bits of the decimal fraction of the binary representation of L(w_{j+1}) as x. Note that L(w_j) and L(w_{j+1}) are differ at the \lceil –log S(w_j) \rceil –th bits, and the above choice allow us distinguish L(w_j) and L(w_{j+1}). The average codeword length of the arithmetic code is approximately evaluaetd as the equation, and it can achieve almost the same efficiency as the Huffman codes. L(wj) + S(wj) L(wj+1) L(wi) + S(wi) L(wi+1) 0.aa...aaa...a + 0.00...0bb...b 0.aa...acc...c 0.aa...aaa...a + 0.00...0bb...b 0.aa...acc...c 平均符号長は – log2S(wj) x = aa...ac 0.aa...ac = H(S) ハフマン符号と同程度の効率を達成可能 0.aa...aaa...a 0.aa...acc...c
算術符号化について:まとめ 算術符号: 大きな対応表を用いることなく,ハフマン符号並の性能を発揮 計算は on-the-fly で行うことが可能であり,装置化も容易 長い系列になると,精度の高い乗算を繰り返し行う必要アリ ⇒ 乗算を用いない(近似)計算法も研究されている summary of the arithmetic codes; - we don’t use any table, but we can achieve almost the same performance as the Huffman code. - all the computation is possible in the “on-the-fly” manner the problem of arithmetic codes are; - we need to repeat the arithmetic computation several times, and the precision of the computation is especially needed when we are encoding a long sequence. This problem might be somewhat mitigated by using a certain approximation.
Lempel-Ziv法 通報の発生確率が事前にわからない場合の符号化法 Lempel-Ziv code a coding in which we don’t have to know the probability of the messages in advance
通報の発生確率について ここまでの議論... 通報の発生確率が,あらかじめわかる場合を想定 現実世界の通報系列... 各通報の発生確率がわからないケースも多い 単純な解決法として,2スキャン方式がある 1回目のスキャンで各記号の発生確率を測定 2回目のスキャンで符号化 ⇒ 符号化遅延の発生,対応表を添付する必要性が生じる We have assumed that the probabilities of messages are known in advance. This assumption is sometimes not practical in may cases. What can we do if we don’t know the probability? naive solution: two-scan scheme - in the first scan, determine the probability of each message - in the second scan, encode by using the Huffman code, for examle This naive solution causes an encoding delay. We also need to send the codeword table because the decoder does not know it in advance.
Lempel-Ziv 法 通報の発生確率が不明でも,効率よい符号化を実現する方式: LZ77法 lha, gzip, zip, zoo 等の圧縮ツールで採用 LZ78法 compress, arc, stuffit等の圧縮ツールで採用 LZW法 GIF, TIFF等の画像フォーマットで採用 どのような情報源に対しても効率が良い ⇒ ユニバーサル符号化 (universal coding) Lempel-Ziv codes is an effective source coding scheme which can be used in the situation in which we don’t know the probabilities of message symbols. There are several variations: - LZ77 - LZ78 - LZW they are sometimes called a universal coding scheme because they can be used for almost all information source.
LZ77方式 A. Lempel, J. Ziv により,1977年に提案された方式 通報の部分系列を,過去に出現したパターンとの最長一致 により表現していく アルゴリズム概要 系列を先頭から動的にブロック化し、符号化する 一つのブロックを,(i, l, x) の3項組で表現 「i 文字前から始まる長さlの系列にxを追加したもの」 LZ77 this is an algorithm proposed by Lempel and Ziv in 1977. We represent a subsequence of messages by reusing the pattern which has once occurred in the sequence. Overview of the algorithm Start from the beginning (left in the sequence), and define “blocks” in a dynamic manner. A block is represented by a three-tuple (i, l, x) which means that “this block contains the same sequence as the length-l substring which starts at the i-symbols before the current position, and we add one character ‘x’”. x –1 –i+l –i l–1 l 符号化の完了した系列
LZ77の符号化例 ABCBCDBDCBCDを符号化する 記号 A B C D 状況 初出現 2文字前と同一 2文字前とは異なる 3文字前と同一 3文字前とは異なる 6文字前と同一 符号語 (0, 0, A) (0, 0, B) (0, 0, C) (2, 2, D) (3, 1, D) (6, 4, *) the table shows an encoding example. The first three blocks are of length one, as no previous block can be used to represent the first three characters A, B and C. The fourth block has length three, because the substring in the block can be constructed by copying a substring (of length two) which has occurred once and by adding one character “D”.
LZ77の復号例 (0, 0, A), (0, 0, B), (0, 0, C), (2, 2, D), (3, 1, D), (6, 4, *) を復号 得られた符号語から,もとの通報系列を逐次構成していく The decoder receives the block information one by one, and reconstruct the sequence of messages accordingly. The point we need to remark is that the decoder itself can construct the block table dynamically as it is constructed at the encoder. One possible problem of LZ77 algorithm is the representation of integers which are used to represent blocks. To represent big integers, we need to consume many bits in general. Sometimes, there is an upper-bound limit on the length of a possible block, which can cause some performance loss. 問題点...整数値の表現をどうする? 大きな整数は,それなりに大きな表現長となってしまう 表現長を超えるようなブロックは,分割して表現する必要あり ⇒ LZ78 法に比べると,若干の効率ロスがある
LZ78方式 A. Lempel, J. Ziv により,1978年に提案された方式 パターンを,(i, l, x) の3項組ではなく,(b, x) の2項組で表現 「b 個前のブロックに,文字 x を追加したもの」 LZ78 employs a similar idea to LZ77, but the representation of a block is modified. In LZ78, a block is represented as a two-tuple (b, x), which says that “this block is constructed by appending x at the end of the block pattern which has occurred b-blocks before. 符号化の完了した系列 x –1 –b
LZ78の符号化例 ABCBCBCDBCDEを符号化する 記号 A B C D E 状況 初出現 2つ前のブロックと同一 1つ前のブロックと同一 符号語 (0, A) (0, B) (0, C) (2, C) (1, D) (1, E) ブロック番号 1 2 3 4 5 6 this shows an example of encoding. The fourth block is defined by appending C to the second block, for example.
LZ78の復号例 (0, A), (0, B), (0, C), (2, C), (1, D), (1, E)を復号 得られた符号語から,もとの通報系列を逐次構成していく The decoding can be done almost in the same way as the LZ77 case. It is known that LZ78 is slightly better than LZ78. Note that the size of blocks is not explicitly embedded in the block representation (b, x), which allow us using arbitrary long blocks. LZ77法より,符号語がコンパクト 一符号語が表現するブロックサイズに,上限がない ⇒ LZ77 法よりも,若干優れた効率を発揮
ユニバーサル符号化について:まとめ LZ法では,適応的にパターン・符号語の対応表を構成する 通報の発生確率がわからなくても,高い性能を発揮 記憶のある情報源から発生する通報にも,自然に対応可能 LZW法には,UNISYS社が主張する特許問題が存在した ⇒ 2004 年に特許期限が切れ,自由に利用可能に summary of the universal codes - in the LZ algorithm the blocks are defined in a dynamic manner - even if we don’t know the probabilities of message symbols, it does not matter - the algorithm works effectively to information source with memory - LZW is a variation of the LZ algorithm, and it is a good material to learn patent and intellectual property.
画像や音声の符号化法について 許容範囲内での情報劣化と引き換えに,効率を稼ぐ符号化法 lossy-coding; a coding which allow that the original information is lost in some extent
符号化の可逆性 ここまでで考えてきた符号化法... 符号化したものを復号すれば,元の情報と完全に同一の 通報系列が得られる ⇒ 可逆符号化,無歪み符号化 (reversible, lossless coding) 可逆性に関する条件を緩めれば,さらに効率が改善できるかも ⇒ 非可逆符号化,有歪み符号化 (non-reversible, lossy coding) 画像や音声等,最終的に人間が受容する情報の符号化を念頭 本講義では,個々の符号化方式については述べない (それぞれ専門の講義科目を履修のこと) the source coding considered so far: if we decode the encoded sequence, then we can obtain the original messages “completely”. This type of coding is called reversible, or lossless coding. If we resign the lossless property, then we may be able to obtain “more compactness”. Source coding based on this observation are called non-reversible or lossy coding This type of coding can be used for images and sounds, which will be processed by human. There are many practical lossy coding algorithms, and we do not go further to those individual algorithms in this class.
非可逆符号化と相互情報量 「符号化」を一つの通信路と考える 可逆符号: 符号語 Y から,X を完全に復元(復号)可能 相互情報量 I(X; Y) = H(X) – 0 = H(X) 非可逆符号: Y を受け取っても,X に関する曖昧さが残る 相互情報量 I(X; Y) = H(X) – H(X | Y) H(X) ...可逆符号よりも小さな相互情報量しか担っていない X 符号化 Y X? We would like to consider the information theoretic aspects of lossy coding. The source coding can be regarded as a “channel” which receives messages as input and generates an encoded sequence as output. In the lossless coding, we can recover X from Y completely. In this case, the mutual information between X and Y is H(X), because the entropy of X reduces to zero after we know Y. In the lossy coding, case, we cannot eliminate the ambiguity of X completely even if we know Y. Therefore the mutual information between X and Y is H(X) or less. This says that the encoded sequence of the lossy coding carries smaller amount of information than that of lossless codes.
情報源符号化定理ふたたび シャノンの情報源符号化定理: 可逆符号の平均符号長は H(X) 以上となる 相互情報量 I(X; Y) の非可逆符号の平均符号長は I(X; Y) 以上 実用上は...「歪み」の概念を使うほうが有用 速度・歪み関数 R(D): 歪み D を達成するのに必要となる相互情報量の最小値 有歪みの場合の情報源符号化定理: 歪み D を達成するには,R(D) 以上の平均符号長が必要 可逆なら H(X)=I(X; Y). より一般的に... Here remind Shannon’s source coding theorem: the average codeword length of a lossless code is H(X) or more. We have understood H(X) as the entropy, but it can be regarded as the amount of information which a codeword carries. In other words, the length of a codeword cannot be shorter than the amount of information which the codeword itself carries. This notion can be described more clearly by using “distortion”. A rate-distortion function R(D) is defined as the smallest mutual information which is needed to achieve the distortion D, assuming that the distortion is measured quantitatively. The lossy-version of Shannon’s source coding theorem, which is known as the rate-distortion thorem is then stated as; To achieve distrotion D, the average codeword length is R(D) or more. This slide just sketches very brief summary of the rate distortion theory. If you are interested in detailed discussion, please try to refer technical books.
本日のまとめ 前回までとは異なる方向性を持つ符号化方式の紹介 算術符号化 通報と符号語の対応表を必要としない LZ符号化 通報に関する統計的性質を必要としない 非可逆符号化 情報劣化と引き換えに効率を稼ぐ summary of today’s class 1) arithmetic codes; a coding scheme which is advantageous for the implementation 2) Lempel-Ziv code; a coding in which we don’t have to know the probability of the messages in advance 3) lossy-coding; a coding which allow that the original information is lost in some extent
練習問題 ABACABADACABADを,LZ77法により符号化せよ 上記問題により得られた符号語系列を復号せよ LZWアルゴリズムの特許に関し,どのような問題が 発生したか調べよ 非可逆符号化を実現するアルゴリズムにはどのような ものがあるか調べよ 4/22(木)は休講 / no class on Apr. 22 (THU) quiz (not a report assignment) - encode the sequence using LZ77 algorithm - decode the result of the above encoding - survey what has happened concerning the patent of LZW algorithm - survey lossy coding algorithms which are used in practical applications