前回の練習問題 ABACABADACABADを,LZ77法により符号化せよ 上記問題により得られた符号語系列を復号せよ

Slides:



Advertisements
Similar presentations
だい六か – クリスマスとお正月 ぶんぽう. て form review ► Group 1 Verbs ► Have two or more ひらがな in the verb stem AND ► The final sound of the verb stem is from the い row.
Advertisements

て -form - Making て -form from ます -form -. With て -form, You can say... ~てもいいですか? (= May I do…) ~てください。 (= Please do…) ~ています。 (= am/is/are doing…) Connecting.
第 5 章 2 次元モデル Chapter 5 2-dimensional model. Contents 1.2 次元モデル 2-dimensional model 2. 弱形式 Weak form 3.FEM 近似 FEM approximation 4. まとめ Summary.
Essay writing rules for Japanese!!. * First ・ There are two directions you can write. ・よこがき / 横書き (same as we write English) ・たてがき / 縦書き (from right to.
VE 01 え form What is え form? え? You can do that many things with え form?
前回の練習問題 12ページの例において,3次,4次のエントロピーを求めよ.可能であれば,n 次エントロピーを計算するプログラムを書け.
SS2-15:A Study on Image Recognition and Understanding
英語特別講座 疑問文 #1    英語特別講座 2011 疑問文.
英語勉強会.
第1回レポートの課題 6月15日出題 今回の課題は1問のみ 第2回レポートと併せて本科目の単位を認定 第2回は7月に出題予定
日本語の文法 文型(ぶんけい)をおぼえよう!
Chapter 11 Queues 行列.
日本語... ジェパディー! This is a template for you to use in your classroom.
前回の練習問題 6個の情報記号を,以下のように配置し,水平垂直パリティ検査符号を構成する. この符号の検査行列を求めよ
Chris Burgess (1号館1308研究室、内線164)
What did you do, mate? Plain-Past
Training on Planning & Setting Goals
Object Group ANalizer Graduate School of Information Science and Technology, Osaka University OGAN visualizes representative interactions between a pair.
前回の練習問題 a = とする(|a|=36) a を長さ2のブロックに区切り,x2値を計算せよ
前回の練習問題 通報とその発生確率を入力すれば,ハフマン符号を構成するプログラムを作成せよ
Only One Flower in the World
日本人の英語文章の中で「ENJOY」はどういうふうに使われているのか
Noun の 間(に) + Adjective Verb てform + いる間(に) during/while.
Model Checking (2) Temporal Logic
前回の練習問題について 情報記号 (x1, …, xk) に対し,検査記号 p = x1+…+xk+1として与えられる奇パリティ符号を考える.この符号が線形符号とならないことを証明せよ. 解答例: 線形符号とならない反例を示せばよい. x1=1, x2=x3=...=xk=0 ⇒ p = 0,対応する符号語は
Tohoku University Kyo Tsukada
A 02 I like sushi! I like origami!
十年生の 日本語 Year 10 Writing Portfolio
Reasonので + Consequence clause
The future tense Takuya Mochizuki.
Licensing information
Chapter 4 Quiz #2 Verbs Particles を、に、で
The Sacred Deer of 奈良(なら)
Who Is Ready to Survive the Next Big Earthquake?
Did he/she just say that? Get your head out of the gutter! Oh wait….
“You Should Go To Kyoto”
VTA 02 What do you do on a weekend? しゅうまつ、何をしますか。
Nihongo Japanese 日本ご ‘Numbers ’ & ‘Hiragana Revision’
ストップウォッチの カード ストップウォッチの カード
P4-21 ネットワーク上の経路に対する 回帰問題について
芝野耕司 ISO/IEC JTC1/SC2 (Coded Character Sets)委員長 東京外国語大学
前回の練習問題 A B C D E F G H 確率 2元ハフマン符号を構成せよ 平均符号長を求めよ 0.363× × ×5= answers.
Causative Verbs Extensively borrowed from Rubin, J “Gone Fishin’”, Power Japanese (1992: Kodansha:Tokyo) Created by K McMahon.
全国粒子物理会 桂林 2019/1/14 Implications of the scalar meson structure from B SP decays within PQCD approach Yuelong Shen IHEP, CAS In collaboration with.
-Get test signed and make corrections
Model Checking (2) Temporal Logic
Term paper, Report (1st, first)
WELCOME TO THE WORLD OF DRAGON BALL
Where is Wumpus Propositional logic (cont…) Reasoning where is wumpus
豊田正史(Masashi Toyoda) 福地健太郎(Kentarou Fukuchi)
大規模なこと Large scale.
Question Words….
クイズやゲーム形式で紹介した実例です。いずれも過去のインターン作です。
いくらですか?.
2019年4月8日星期一 I. EPL 84, (2008) 2019年4月8日星期一.
超伝導回路を用いた 物理乱数発生回路の研究
Expressing uncertainty: Might
2019/4/22 Warm-up ※Warm-up 1~3には、小学校外国語活動「アルファベットを探そう」(H26年度、神埼小学校におけるSTの授業実践)で、5年生が撮影した写真を使用しています(授業者より使用許諾済)。
Term paper, report (2nd, final)
第1回レポートの課題 6月24日出題 今回の課題は1問のみ 第2回レポートと併せて本科目の単位を認定 第2回は7月に出題予定
Genetic Statistics Lectures (4) Evaluation of a region with SNPs
北大MMCセミナー 第62回 附属社会創造数学センター主催 Date: 2016年11月4日(金) 16:30~18:00
ー生命倫理の授業を通して生徒の意識に何が生じたかー
The difference between adjectives and adverbs
Created by L. Whittingham
Cluster EG Face To Face meeting
Grammar Point 2: Describing the locations of objects
Term paper, report (2nd, final)
アノテーションガイドラインの管理を行う アノテーションシステムの提案
Improving Strategic Play in Shogi by Using Move Sequence Trees
Presentation transcript:

前回の練習問題 ABACABADACABADを,LZ77法により符号化せよ 上記問題により得られた符号語系列を復号せよ 4文字前 長さ3 A B AC ABAD ACABAD 6文字前 長さ6 2文字前 長さ1 符号語系列は (0, 0, A), (0, 0, B), (2, 1, C), (4, 3, D), (6, 6, *) quiz in the last class - encode the sequence using LZ77 algorithm the figure illustrates the computation. We need two numerical information to represent one block; how many symbols we need to look back, and how many symbols we need to copy. - decode the result of the above encoding do the converse of the above encoding procedure. 上記問題により得られた符号語系列を復号せよ 符号化と逆の手順を行えばよい 4文字前 長さ3 A B AC ABAD ACABAD 6文字前 長さ6 2文字前 長さ1

前回の練習問題 LZWアルゴリズムの特許に関し,どのような問題が 発生したか調べよ UNISYS社が特許権を保有していた 当初は特許権の行使をしないと言明 ⇒ 後に方針転換 GIF 画像を扱うソフトウェアの取り扱いに混乱 2003年に米国,2004年に日本での特許失効 非可逆符号化を実現するアルゴリズムにはどのような ものがあるか調べよ (アルゴリズムとデータフォーマットの区別が明確でないが) JPEG, MPEG, GIF 等 quiz in the last class (cnt’d) - survey what has happened concerning the patent of LZW algorithm Originally, UNISYS holds the patent of LZW. At the beginning, UNYSYS said that they will not charge for the use of the LZW algorithm, because they considered that the algorithm will not have commercial importance. Some years later, the web become popular and the people in UNISYS found that good algorithms for images data play important roles in many on-line business. They changed the policy and said that they will charge for the use of LZW algorithm. This made a pretty confusion because the GIF encoding which uses the LZW algorithm were so popular that time. The patent has been expired in 2003 in the US, and in 2004 in Japan. - survey lossy coding algorithms which are used in practical applications there are many algorithms, though the distinction of the “algorithm” and “data format” is not clear...

本日の講義 乱数,擬似乱数について...「情報理論」からは少し寄り道 乱数,擬似乱数 Kolmogorovによる乱数の定義 擬似乱数の統計的検定 x2検定 擬似乱数の生成法 today’s class random sequence and pseudo-random sequence this is not on the main-stream of the information theory - random numbers and pseudo random numbers (sometimes “numbers” and “a sequence” are used confusingly in this slide) - the definition of randomness by Kolmogorov - test of randomness - chi-square test - algorithms for generation random numbers

乱数とは 乱数,乱数系列(random numbers, random sequence) 広辞苑による記述: 0から9までの数字から無作為復元抽出を繰り返して得られる数 の列.完全に無秩序でかつ全体としては出現の頻度が等しい. 工学の立場からは,上記の記述では不十分 「0から9までの数字」というのは本質的でない ⇒ 0 と 1 だけからなる乱数を考える場合も多い 「完全に無秩序」が,どのような状態を示すのか明確でない what is a random number? according to a dictionary, random numbers is a sequence of numbers, between 0 to 9, which are chosen with no special intensions or rules. It is a disordered sequence, and the probabilities of each number in the sequences are almost all equal. This definition is not “engineering” - the restriction “0 to 9” is not essential, and we often consider random numbers with 0s and 1s. - it is not clear what “disorder”means.

「無秩序さ」の定式化 「無秩序さ」:統計的に記述するのが一般的だが... 計算機の出現により,「無秩序さ」にも新しい定式化が出現 有限系列 x を出力するプログラムの集合を (x) とする プログラム...入力なしの決定性プログラム(毎回同じ動作) プログラム pの記述長を |p| と書く(文字数,行数等) x のコルモゴロフ複雑さ(Kolmogorov complexity) formalization of the “disorder” The disorder might be formalized by statistical means, but a new formalization has been investigated after the invention of computers and computer programs. Let x be a sequence with finite-length, and let Pi(x) denote the set of programs which outputs x, where a program in this context is a deterministic procedure which takes no inputs. The size of a program is denoted by |p| (|p| can be a number of lines or characters of the program). The Kolmogorov complexity of a sequence x is defined as the size of the smalless program in Pi(x). (x を出力する最小のプログラムのサイズ)

コルモゴロフ複雑さの例(1) x1 = “0101010101010101010101010101010101010101”... 40文字 x1 を出力するプログラム プログラムp1 : printf(“010101...01”); ...51文字 プログラム p2 : for(i=0;i<20;i++)printf(“01”); ...30文字 ⇒K(x1)  30 < 40 x2 = “0110100010101101001011010110100100100010”... 40文字 x2 を出力するプログラム プログラムp1 : printf(“011010...10”); ...51文字 ⇒K(x2)  51 Here are some examples. Note that the Kolmogorov complexity is an ideological notion, and it is difficult to determine the Kolmogorov complexity as a numerical value. Consider these examples as just sketch of ideas of Kolmogorov. Let x_1 be the sequence given in the slide. It consists of 40 characters. We can consider two programs which output x_1. The program p_1 outputs the string by a single printf sentence, and the program will contain 51 characters (ignore headers) . We can also consider another program p_2 which makes use of the “for” sentence. The size of this program is 30 characters. This suggests that the Kolmogorov complexity of the sequence x_1 is 30 or less. Next consider x_2. For this sequence we can consider a program (again) referred as p_1, and the program contains 51 characters. Unfortunately I could not find any other program which outputs x_2 and which is smaller than this p_1. The Kolmogorov complexity of x_2 is therefore 51 or less.

コルモゴロフ複雑さの例(2) x3 = “11235813213455891442333776109871597...”... 100万文字 プログラムp1 : printf(“1123...”); ...100万+11文字 プログラム p2 : フィボナッチ数列を計算して出力 ⇒ 数百文字で十分 ⇒K(x3) は数百文字以下 コルモゴロフ複雑さ K(x): x の統計量ではなく,x の持つ構造・規則性の指標となっている Here is another example. Assume that there is a sequence x_3 which starts as illustrated and one million characters follow. For this sequence we can consider a program with one printf sentence whose size become one-million plus 11 characters. For this sequence, we can find another program if we realize that the sequence is actually a Fibonacci sequence. A program which computes the Fibonacci sequence will be realized by several hundred characters only, and the Kolmogorov sequence of x_3 is about several hundred bits. An interesting aspects of the Kolmogorov complexity is that, it is a measure of the “structure” and “regularity” rather than statistics.

コルモゴロフ複雑さと無秩序さ コルモゴロフ複雑さが大きいほど「無秩序」の程度が大きい ただし,コルモゴロフ複雑さを実際に計算することは困難 「無秩序さ」を定式化するための概念的なもの エントロピーとの違いは? エントロピーは,記号発生の統計的側面だけに着目 コルモゴロフ複雑さは,記号発生のメカニズムも考慮に Kolmogorov complexity and the “disorder” Larger Kolmogorov complexity implies that the sequence is more disorder. However, it is impossible determine the actual values of the Kolmogorov complexity. It is a “conceptual” measure rather than concrete one. You may wonder what is the difference between Kolmogorov complexity and the entropy. The entropy is a measure of disorder but it is defined according to the statistical aspect of the sequence. On the other hand, Kolmogolov complexity suggests the “complicatedness” of the mechanism which generates the sequence.

コルモゴロフ的な乱数の定義 コルモゴロフ複雑さに基づく乱数の定義: 系列 x について,K(x)  |x| のとき,x を乱数(系列)という x を記述するには,x を「ベタ書き」するしかない x をコンパクトに表現することができない(圧縮不能) プログラムもデータも2進系列で表現すると考えると... 長さ n の2進系列...2n 個存在 長さ n 未満の2進系列...2n – 1 個しか存在しない ⇒ 必ず,K(x)  x である系列(圧縮不能な系列)が存在する ⇒ 任意長の(コルモゴロフの意味での)乱数が必ず存在する The Kolmogorov complexity is used to formalize randomness. A sequence x is said to be random if K(x), the Kormogolov complexity of x, equals to or more than the length of the sequence x. In other words, to write x “as is” (by using one printf sentence”) is the most efficient way to represent x. There is no way to represent x more compactly than x itself. This definition allows us discussing how many random sequences exist in the world. Remind that a program is also a sequence, and assume naturally that a program is written as a binary sequence. For a positive integer n, there exist 2^n sequences whose length is n. On the other hand, the number of sequences which are shorter than n is only 2^n – 1. This suggests that there exists a sequence x with K(x) >= x, and that there exists a random sequence with an arbitrary length.

乱数の存在確率 Vn:長さ n の2進系列集合,|Vn| = 2n, |V1|+ |V2|+...+|Vn–1| = 2n–1 Next we consider “how common” random sequences is. Let V^n be the set of binary sequence with length n. The first line of the slide shows basic properties of the numbers of sequences in V^n. Assume that, 2^n -1 sequences out of 2^n sequences in V^n can be compressed (i.e. there is a shorter representation), then each of those sequences has a program which generates the sequences. The size of the program is less than n, and the program is a sequence in either of V^1, ..., V^{n-1}. This implies that no sequence in V^1, ..., V^{n-1} can be compressed, and that the sequences in V^1, ..., V^{n-1} are random. By extending this discussion, we can show that almost all sequences are random in the sense of Kolmogorov complexity. ⇒ ほとんどすべての系列は圧縮不能(乱数である)ことを導ける

乱数の計算可能性 ほとんどすべての系列は乱数である: 乱数の存在は保証しているが,具体的な作り方は不明 サイコロを振る,無作為に生成されるデータを利用 etc. 十分に無秩序で良い乱数を得ることができる 大量の乱数を得るには効率が悪い 専用の乱数生成装置を利用 大量の乱数を効率よく生成できる 特殊なハードウェアが必要でコスト高 真の乱数(真性乱数)を,大量かつ安価に生成するのは困難 Asymptotically saying, almost all sequences are random. However, we do not know how to generate random sequences. One common procedure to generate random sequences is to use a dice. This gives us random sequences with very good quality, but unfortunately it is not very efficient way. We will not be able to obtain mega bytes of random data by casting dices by ourselves. We may devise special hardware for generating random numbers, but such hardware must be expensive. Unfortunately, it is very difficult to generate “real” random sequences economically and efficiently.

擬似乱数 擬似乱数 (pseudorandom numbers): ある種の規則性にしたがって生成される「乱数のようなもの」 コルモゴロフ的な意味での乱数ではない コンピュータプログラム等により,大量かつ安価に生成可能 ⇒ 各種の実験やシミュレーション等の実施に有益 擬似乱数にも良いものと悪いものがある ⇒ 各種の検定により,擬似乱数の品質を評価することが必要 (検定法を頭に入れた後で,具体的な擬似乱数生成法を学ぶ) a pseudorandom number s are numbers which seem like random numbers, but generated according to a certain rule. In this sense, the Kolmogorove complexity of pseudorandom numbers is smaller than the original sequence, and they are not random numbers. However, such pseudorandom numbers are useful in various kinds of experiments and computer simulations. One note we need to remind is that there are “good” pseudorandom numbers and “bad” pseudorandom numbers. We would like to distinguish the good one bad one. After reviewing those “test” scheme, we consider algorithms for generating pseudorandom numbers.

統計的検定 擬似乱数の評価尺度: 予測が困難か? 暗号等の用途では重要だが,評価が難しい 十分に無秩序か? 数列の統計的性質を検定すれば良い...比較的容易 本講義では,統計的検定法について紹介 「あるプログラムから生成された十分長い系列が, どれだけ真性乱数に近い統計的性質を有するか」を議論 「特定の系列が乱数か否か」を判定する問題ではない statistical test there are several aspects in evaluating pseudorandom numbers. - is it difficult to guess the forthcoming sequences? this property is important if we use pseudorandom numbers in cryptographic purpose, but this is rather difficult subject. We do not discuss this issue in this class. - does the sequence have certain statistical biases? this can be evaluated by comparing statistics of the sequence with that of the real random sequences. In this class, we consider statistical test to evaluate pseudorandom numbers. We assume that we have a very long sequence, and evaluate how the statistics of the sequence show similar properties to real random sequences. Unfortunately this study does not give a clear answer to a naive question “is this sequence random?”

復習:確率,確率密度関数 確率変数:ある試行にともなって,なんらかの値を取る変数 確率分布: 確率変数が,どの確率でどの値を取るかをあらわしたモノ 確率変数が離散的な場合,確率分布は表の形で与えられる 確率変数が連続的な場合,確率分布は関数により与えられる review of statistics. remind the notion of a random variable and probability distribution function. If the random variable is not continuous, then the probability distribution function might be given by a table. 晴 曇 雨 雪 0.3 0.4 0.2 0.1 確率密度関数

確率密度関数について 確率密度関数(probability density function, pdf ) 連続的な確率変数の確率分布を表現する関数 確率変数が x 以上 y 未満の値を取る確率= = x y ここの面積 x → 最小値,y → 最大値のとき,面積=1 The probability distribution function, pdf is a function which represents the distribution of a certain continuous random variable. The probability that a random variable x takes a certain value between x and y is given by the integral. If x goes to the minimum value and y goes to the maximum value, then the integral approaches to one.

x2値の定義 確率変数Xの取る値を,有限個のクラス C1, C2, ..., Cl に分割する C1 = [0, 10), C2 = [10, 30), C3 = {30以上の偶数}, C4 =その他 理想的なケースにおいて,X が Ci に属する確率は pi とする 実際に観測された値の系列を a = a1, a2, ... an とする 系列 a が,どれだけ理想に近いか評価したい ⇒ 以下の x2 値(x2 value, Chi square value)を評価すれば良い This slides define x^2 values, where “x” is a Greek symbol “chi”. Assume that we partition the domain of a random variable X into l classes C_1, C_2, ..., C_l. If X takes integer values between 0 to 100, then one possible partition is C_1, C_2, C_3 (even numbers with 30 or more) and C_4 (others). In X follows an ideal distribution, then we can determine p_i which is the probability that X takes a value in C_i. In the real observations, X does not follow the ideal distribution. Let a_1, ..., a_l be the actual sequence which is observed as X in a real trial. To evaluate how a_1, .., a_l are close to the ideal case, we deifne x^2 square values as given in the slide, where n_i is the number of observations that X takes a value in C_i. (カイ2乗値) ni...系列 a において Ci に属する要素の個数

x2値の解釈 ni...系列 a において Ci に属する要素の個数 npi...長さnの理想的系列における ni の期待値 ⇒ 分子部分は 0 に近づく ⇒ x2 値も 0 に近づく Review the equation. n_i is the number of observations that X takes a value in C_i. np_i is the expected number of observations that ideal X takes a value in C_i. If the sequence a_1, ..., a_n is close to the ideal scenario, there it is expected that is not so much difference between n_i and np_i. In this case, the chi-square value approaches to zero.

x2値の計算例 1, 2, ...6 からなる42文字の系列(n = 42)を考える これが真性乱数なら,pi = 1/6 ⇒ npi = 7 a1 = 145325415432115341662126421535631153154363 n1 = 10, n2 = 5, n3 = 8, n4 = 6, n5 =8, n6 = 5 x2 = 32/7 + 22/7 + 12/7 + 12/7 + 12/7 + 22/7 = 20/7 a2 = 112111421115331111544111544111134411151114 n1 = 25, n2 = 2, n3 = 3, n4 = 8, n5 =4, n6 = 0 x2 = 182/7 + 52/7 + 42/7 + 12/7 + 32/7 + 72/7 = 424/7 a1 のほうが a2 よりも理想的なケースに近い Here are examples of computation of chi-square values. We consider several sequences of integers from 1 to 6, where the length of the sequences is n = 42. If this were a truly random sequence, then it is expected that each integer occurs with the same probability of p_i = 1/6, and the expected number of occurrences of each integer is np_i = 7. Consider a_1 first. In this sequence, there are ten “1”, five “2” and so on. The chi-square value of a_1 is computed as given in the slide. Next, consder a_2. This sequence contains so many “1” but no “6”, so biased. The chi-square value of a_2 is 424/7, much larger than the chi-square value of a_1. This suggests that a_1 is close to the truly random sequence than a_2.

x2値の計算例(続) a3 = 111111111111222222...666666 (長さ 72) n1 = 12, n2 = 12, n3 = 12, n4 = 12, n5 =12, n6 = 12 x2 = 02/12 + 02/12 + 02/12 + 02/12 + 02/12 + 02/12 = 0/12 理想的な乱数? 当然 no 長さ2のブロック化を考える...n = 36ブロック 理想は,“11”, ... “66” が1/36 の確率で発生,npi = 1 n11 = 6, n12 = 0, ..., n22 = 6, ... x2 = (6 – 1)2/1 + (0 – 1)2/1 + ... = 180 ⇒ かなり大きい 様々なクラス分割により,多面的に評価する必要がある Next, consider a sequence a_3 defined as this. For this example, the length of the sequence is 72. This is a quite unnatural sequence as you can see, but the chi-square value of a_3 is zero. Does this mean that a_3 is a “random” sequence? Obviously no. For example, consider to divide the sequence into blocks of size two. In this case, there are 36 patterns, and each pattern occurs equally likely. The sequence has 36 blocks and the expected number of each block occur is one. However, in the sequence a_3, we have six occurrences of the pattern “11”, zero occurrence of “12” and so on. The chi-square value of this “blocked” sequence is then determined to be 180. This is a pretty big value, suggesting that this sequence is not a random sequence. This example tells us one important fact. We need to evaluate a sequence using different partitions and different definitions of the classes.

x2値の評価 理想的な系列であっても,x2 値が必ず0になるとは限らない 理想的系列における x2 値の分布と,実測値とを比較すべき [定理] 分割クラス数が l のとき,理想的系列における x2 値の分布は, 自由度が l – 1 のx2 分布 (Chi square distribution)にしたがう We remark that truly random sequences do not always make the chi-square value to zero. To evaluate the randomness of pseudorandom numbers, we need to known the distribution of the chi-square values of truly random sequence. This distribution is known as the Chi-square distribution. Theorem: If the number of classes is l, then the chi-square values of truly random sequences obeys the chi-square distribution with degree l-1. The graph sketches the distributions for several degrees. 自由度 2 自由度 4 自由度 6 x2 O

x2値の比較 a3 = 111111111111222222...666666 (長さ 72) x2 = 02/12 + 02/12 + 02/12 + 02/12 + 02/12 + 02/12 = 0/12 クラス数が6 ⇒ 自由度が 5 の x2 分布に従うはず x2 O 自由度 5 Remind a_3 which we used previously. Without the “block” partition, we had considered six classes with each class corresponds to an integer between 1 to 6. If the sequence were truly random, then the chi-square value obeys the chi-square distribution with degree 5, which is illustrated as this. This graph suggests that it is quite rare that the chi-square value of a truly random sequence become zero. Too small chi-square values suggest that the sequence is unnaturally “ordered”. 理想系列では,x2 値が 0 になる確率はきわめて小さい 観測系列の x2 値が 0 ⇒ 理想系列では考えにくい振る舞い x2 値の大小だけでなく,理想的なx2分布と比較することが重要

x2値検定の実施例 x2 値の大小だけでなく,理想的なx2分布と比較することが重要 実際には... 元のデータをいくつかのデータセット(部分集合)に分割 各データセットに対して x2 値を計算 得られた x2 値の集合の妥当性を検証 再び統計的検定を行う 上下パーセント点と比較する 上側1%点... 100回に1回くらいは,この値を超えるかもという点 Here are some suggestions to make use of chi-square tests. Basically, the chi-square should be considered together with the chi-square distributions. In a typical scenario, the chi-square test is done as follows. 1. Divide the original sequence into several subsequences. Each subsequence is called a “data set” in this slide. 2. For each data set, compute the chi-square value. We have a set of chi-square values. 3. Compare the distribution of the chi-square values of the data sets and the chi-square distribution. If the two distributions are similar, then we can say that the original sequence has good randomness. The similarity of distribution can be defined in several different ways. - use a statistical test again - use percentile points: upper one-percentile point is a value (of chi-square values) such that one out of one hundred samples can have a chi-square value above this point.

その他の統計的検定手法 KS検定 (Kolmogorov-Smirnov検定) x2 検定の連続変数バージョン(クラス分割不要) ラン長検定 長さ k のラン発生確率 = 0.5×長さ k – 1 のラン発生確率 (2元の無記憶定常情報源で,通報発生が等確率のとき) ランの発生個数について,x2検定を行う手法 ポーカー検定,衝突検定,間隔検定 etc. いずれの方法も,十分長い系列に対して行う必要がある (短い系列では,統計的性質が出現しにくい) This slide shows several other statistical tests. KS test can be regarded as a continuous version of the chi-square test. We don’t have to define the family of classes in advance. The run-length test makes use of the distribution of runs in a sequence. In binary sequences, the probability that “a run with length k occurs” is half of the probability that “a run with length k – 1 occurs”. This allows us estimating the expected numbers of runs in a sequence. We can make use of this expected numbers to make the chi-square test based on run-length. There are several more test, but one common remark that they are useless if the length of the original sequence is too short.

擬似乱数生成アルゴリズムについて 様々な擬似乱数生成アルゴリズムが提案されている その多くが,少量の乱数の種(シード, seed)から,多量の 擬似乱数系列を生成するもの 線形合同法 (linear congruent method) 漸化式に基づいて擬似乱数を生成する方式 i 番目の値が Xi のとき,Xi+1 = aXi + c mod M とする a, c, M は適当なパラメータ(選び方にはコツがある) 得られる値は 0 から M – 1の整数値 algorithm for generating pseudorandom sequences. There are several algorithms for generating pseudorandom sequences (such algorithms are sometimes called pseudorandom sequence generator (PSG). PSG usually accepts a seed information, and generates plenty number of pseudorandom numbers by using the seed. The linear congruent method is one of typical PSG algorithm. In this algorithm, random numbers are determined according to a certain recursion. For example, if the i-th value of the sequence is X_i, then the next number is determined as X_{i+1} = aX_i + c mod M, where a, c and M are appropriately chosen parameters. This recursion gives us pseudorandom numbers between 0 to M – 1.

線形合同法の性質について 得られる擬似乱数列の周期は,必ず M 以下となる M は,ある程度大きな値にすべき M の選択がまずいと,Xiの下位ビットのランダム性が悪化する M は素数にするのが良い 処理を簡便にするため,M = 2e とする実装もある パラメータ a や c の選択についても,ある程度経験則が存在 Properties of linear congruent method have been studied widely. - The length of the cycle of the sequence is M or less. Thus M must be a big number. - If the choice of M is poor, then lower bits of X_i will not be random. It is recommended to choose a prime number for M, but it is OK in some cases to use M as a power of two, which is advantageous in implementation. - There are some heuristics for choosing parameters a and c.

線形合同法の問題点 (Xi, Xi+1) で与えられる2次元平面上の点を考える (Xi, Xi+1) の点が,平面全体に広がることが理想的 ⇒ 直線 x = a 上には,高々一個の点しか配置されない One serious problem of the linear congruence method is that the randomness is poor if the algorithm is used in a high-dimensional context. Consider that two consecutive numbers X_i and X_{i+1} are used to choose one point in a two-dimensional plane. When one uses a PSG, he or she expects that the sampled points distribute all over the plane. However, if we employ the linear congruence method as PSG algorithm, then this expectation is betrayed. In the linear congruence method, X_{i+1} is uniquely determined from X_i. This means that the y-value is uniquely determined for the x-value in the plane. The sampled points align on a certain line on the plane. The example shows hat happens if we choose the equation in the slide is used. This alorithm is not appropriate if we would like to sample a point over a plane. (初期値が 5のとき) Xi+1 = 5Xi + 1 mod 7 のとき... 6 5 4 3 2 平面上の点を一個選んで... という用途には使えない 1 O 1 2 3 4 5 6

M系列法 M系列法 (M-sequence method) 線形フィードバックシフトレジスタを利用して擬似乱数を生成 p 段シフトレジスタ ⇒ 状態数は 2p 通り存在する 結線方法によっては,周期2p–1の擬似乱数系列が生成可能 特異点(全ゼロ状態)以外のすべての状態を経由 ⇒ 2p–1 という周期は,レジスタ数 p 段では最大値(Max) Another well known algorithm is the M-sequence method. In this method, we generate sequences by making use of a linear feedback shift register. If the depth of the register is p, then the circuit has 2^p different states. By carefully designing the value of the feedback switches, the register can generate a sequence whose cycle length is 2^p -1. In this case, the registers experience all the possible states except the all-zero singular state. Note that the cycle length is the maximum among all sequences which can be generated by shift registers with depth p. Xi Xi–1 Xi–p Xi–p+1 Xi–p+2 結線 or 断線

M系列法について 結線方法は,原始多項式の係数にしたがって決定(省略) 生成される系列は,非常に優れた統計的性質を持つ レジスタの初期値の違い ⇒ 生成される系列の位相の違い シフト加法性 高い自己相関性を持つ ⇒ CDMA などの通信方式でも利用 properties of the M-sequence method - the values of feedback switches can be determined algebraically - the generated sequence has superior statistical properties - the difference of the initial contents of the registers does not make different sequences. It just makes a phase shift of an identical sequence - shift additive property: if we shift an M-sequence and add it to the original sequence, then the obtained sequence is again the M-sequence, just phase shifted - strong self correlation property: compute XOR of two sequences which are obtained by phase shifting an identical sequence. The number of zeros in the XOR become small if the two sequences are similar. If the sequence is the M-sequence, then the number of zeros shows strong peak only if the two shift phases are the same. This property is used in some communication systems.

その他の擬似乱数生成法 メルセンヌ・ツイスタ法 (MT法,Mersenne Twister method) 松本眞氏(現広島大),西村拓士氏(現山形大)により提案 メルセンヌ素数を使い,シフトレジスタ法を一捻りした方 高品質の擬似乱数を高速に生成することが可能 予測不能な擬似乱数生成法 いわゆる「記憶のない」乱数生成 ⇒ 暗号的用途に向く Blum 法など 線形合同法,M系列法,MT法は,暗号用途には使えない other algorithms - Mersenne Twister (MT) method - invented by Drs. Matsumot and Nishimura - give one twist of a shift-register based scheme - efficiently generates high-quality pseudorandom numbers - unpredictable PSG - some algorithms generate sequences which seems “memoryless”. This property is contributing in some cryptographic applications. - Blum algorithm is one typical algorithm - linear congruence, M-sequence and MT schemes are not of this type.

まとめ コルモゴロフ的な乱数の定式化 擬似乱数の統計的検定 x2検定 具体的な擬似乱数生成アルゴリズム 線形合同法,M系列法 summary of the class - Kolmogorov complexity - statistical test - algorithms

練習問題 a = 110010111110001000110100101011101100とする(|a|=36) a を長さ2のブロックに区切り,x2値を計算せよ 長さ3,4のブロックに区切り,それぞれx2値を計算せよ 線形合同法を実装し,擬似乱数を生成せよ 上記で生成した擬似乱数を,2次元平面にプロットせよ (スライドの26ページ参照) レポート課題: http://www.naist.jp/~kaji/ 参照,5/10提出 report assignment: available from the above URL, due by May 10 quiz - let a be given as the slide - divide the sequence into blocks of length two, and compute the chi-square value - do the same thing for blocks with length three and four. - write a program of the linear congruence method - use the program to generate a pseudorandom sequence, and plot the obtained data in a two-dimensional array as in page 26 of this slide. Report assignment is given in the URL. The due date is May 10.