L3. Search for Decisions in Games

Slides:



Advertisements
Similar presentations
ベイズの定理と ベイズ統計学 東京工業大学大学院 社会理工学研究科 前川眞一. 2 Coffe or Tea 珈琲と紅茶のどちらが好きかと聞いた場合、 Star Trek のファンの 60% が紅茶を好む。 Star Wars のファンの 95% が珈琲を好む。 ある人が紅茶を好むと分かったとき、その人が.
Advertisements

だい六か – クリスマスとお正月 ぶんぽう. て form review ► Group 1 Verbs ► Have two or more ひらがな in the verb stem AND ► The final sound of the verb stem is from the い row.
Humble and Honorific Language By: Word-Master Leo, Mixer of Ill Beats.
て -form - Making て -form from ます -form -. With て -form, You can say... ~てもいいですか? (= May I do…) ~てください。 (= Please do…) ~ています。 (= am/is/are doing…) Connecting.
Essay writing rules for Japanese!!. * First ・ There are two directions you can write. ・よこがき / 横書き (same as we write English) ・たてがき / 縦書き (from right to.
VE 01 え form What is え form? え? You can do that many things with え form?
英語特別講座 疑問文 #1    英語特別講座 2011 疑問文.
All Rights Reserved, Copyright (C) Donovan School of English
The Bar バー.
1.
第1回レポートの課題 6月15日出題 今回の課題は1問のみ 第2回レポートと併せて本科目の単位を認定 第2回は7月に出題予定
日本語の文法 文型(ぶんけい)をおぼえよう!
Chapter 11 Queues 行列.
Ex7. Search for Vacuum Problem
日本語... ジェパディー! This is a template for you to use in your classroom.
Recognise, ask about and talk about purpose
と.
3月6日(金曜日) 漢字 #6-10 Verbs! (continued) Particles Time References
Ex8. Search for Vacuum Problem(2)
ひな祭り.
Lispとは ゲーム理論 minimaxアルゴリズム αβアルゴリズム ソースコードの一部
What did you do, mate? Plain-Past
Verb Plain Negativeform
Only One Flower in the World
日本人の英語文章の中で「ENJOY」はどういうふうに使われているのか
Noun の 間(に) + Adjective Verb てform + いる間(に) during/while.
Ch13-1 TB250 てフォーム.
How do you talk about Positions/ Locations?
なんがつにいきますか? What month do you/will you go?
A 02 I like sushi! I like origami!
十年生の 日本語 Year 10 Writing Portfolio
Reasonので + Consequence clause
The future tense Takuya Mochizuki.
Chapter 4 Quiz #2 Verbs Particles を、に、で
定期考査2 英語.
The Sacred Deer of 奈良(なら)
Did he/she just say that? Get your head out of the gutter! Oh wait….
“You Should Go To Kyoto”
VTA 02 What do you do on a weekend? しゅうまつ、何をしますか。
Nihongo Japanese 日本ご ‘Numbers ’ & ‘Hiragana Revision’
ストップウォッチの カード ストップウォッチの カード
Input slides 1 – 11 – teacher says word - pupils repeat – word appears on click Ohayoo. おはよう。
National adviser Japanese Yuriko Kayamoto
Causative Verbs Extensively borrowed from Rubin, J “Gone Fishin’”, Power Japanese (1992: Kodansha:Tokyo) Created by K McMahon.
全国粒子物理会 桂林 2019/1/14 Implications of the scalar meson structure from B SP decays within PQCD approach Yuelong Shen IHEP, CAS In collaboration with.
-Get test signed and make corrections
くれます To give (someone gives something to me or my family) くれました くれます
Term paper, Report (1st, first)
WELCOME TO THE WORLD OF DRAGON BALL
Where is Wumpus Propositional logic (cont…) Reasoning where is wumpus
疑問詞 1年生で学習した疑問詞.
Michael Jeffrey Jordan
Part time jobs in restaurant
Question Words….
クイズやゲーム形式で紹介した実例です。いずれも過去のインターン作です。
いくらですか?.
Ex7. Search for Vacuum Problem
Expressing uncertainty: Might
2019/4/22 Warm-up ※Warm-up 1~3には、小学校外国語活動「アルファベットを探そう」(H26年度、神埼小学校におけるSTの授業実践)で、5年生が撮影した写真を使用しています(授業者より使用許諾済)。
Term paper, report (2nd, final)
ー生命倫理の授業を通して生徒の意識に何が生じたかー
The difference between adjectives and adverbs
Created by L. Whittingham
東北大 情報科学 田中和之,吉池紀子 山口大 工 庄野逸 理化学研究所 岡田真人
~て しまう.
第八課文法二 Chapter 8 Grammar 2
Grammar Point 2: Describing the locations of objects
Indirect Speech 間接話法 Kaho.I.
点素パス問題に対するアルゴリズム 小林 佑輔 東京大学 大学院情報理工学系研究科 組合せ最適化セミナー 2012 年 7月 13日
Improving Strategic Play in Shogi by Using Move Sequence Trees
Presentation transcript:

L3. Search for Decisions in Games Minimax algorithm - algorithm Tic-Tac-Toe game

decision minimax a limited time search space uncertainty complexity consequence analyze terminal test utility function optimal strategy back up outcome

decision 決定 minimax 最小最大 a limited time 有限的な時間 search space 探索空間 uncertainty 不確定性 complexity 複雑さ consequence (続いて起こる)結果 analyze 分析 terminal test 終わり状態テスト utility function 有益性評価関数 optimal strategy 最適の方法 back up 逆の方向 outcome 結果

Games as search problems Game playing is one of the oldest areas of endeavor in AI. What makes games really different is that they are usually much too hard to solve within a limited time. For chess game: there is an average branching factor of about 35, games often go to 50 moves by each player, so the search tree has about 35100 (there are only 1040 different legal position). The result is that the complexity of games introduces a completely new kind of uncertainty that arises not because there is missing information, but because one does not have time to calculate the exact consequences of any move. In this respect, games are much more like the real world than the standard search problems. But we have to begin with analyzing how to find the theoretically best move in a game problem. Take a Tic-Tac-Toe as an example.

Perfect two players game Two players are called MAX and MIN. MAX moves first, and then they take turns moving until the game is over. The problem is defined with the following components: initial state: the board position, indication of whose move, a set of operators: legal moves. a terminal test: state where the game has ended. an utility function, which give a numeric value, like +1, -1, or 0. So MAX must find a strategy, including the correct move for each possible move by Min, that leads to a terminal state that is winner and the go ahead make the first move in the sequence. Notes: utility function is a critical component that determines which is the best move.

Minimax ゲームは、自分にとっては最も有利な手を自分が打ち(max)、次に相手が自分にとって最も不利な手を打ち(min)、それらが交互に繰り返されることによって成り立ちます。 It includes five steps: Generate the whole game tree. Apply the utility function to each terminal state to get its value. Use the utility of the terminal states to determine the utility of the nodes one level higher up in the search tree. Continue backing up the values from the leaf nodes toward the root. MAX chooses the move that leads to the highest value. A2 A31 A1 A3 A11 A12 A13 A22 A23 A21 A32 A33 5 2 4 6 14 8 12 3 L1 (maximum in L2) L2 (minimum in L3) L3

A2 A1 A3 MinMax (GamePosition game) { return MaxMove (game); } A31 A11 5 2 4 6 14 8 12 3 MaxMove (GamePosition game) { if (GameEnded(game)) { return EvalGameState(game); } else { best_move <- {}; moves <- GenerateMoves(game); ForEach moves { move <- MinMove(ApplyMove(game)); if (Value(move) > Value(best_move)) { best_move <- move; } return best_move; MinMove (GamePosition game) { best_move <- {}; moves <- GenerateMoves(game); ForEach moves { move <- MaxMove(ApplyMove(game)); if (Value(move) < Value(best_move)) { best_move <- move; } return best_move;

To sum up: So the MAX player will try to select the move with highest value in the end. But the MIN player also has something to say about it and he will try to select the moves that are better to him, thus minimizing MAX's outcome <Minimax 法> 想定される最大の損害が最小になるように決断を行う戦略。将棋やチェスなどコンピュータに思考させるためのアルゴリズムの一つ。 実行例1

- pruning (Alpha-Beta法) 実行例2 3 L1 (maximum in L2) A1 A2 A3 =3 2 L2 (minimum in L3) A11 A12 A13 A21 A22 A23 A31 A32 A33 L3 =-999 =3 =999 初期値 8  =2 4 6 14 5 2 =3 =2  >=  A121 A122 A123 2 3 1 Pruning this branch of the tree to cut down time complexity of search so as to speed up minimax search = 12 5 9 =12 =3  > 

MaxMove (GamePosition game, Integer alpha, Integer beta) { if (GameEnded(game) || DepthLimitReached()) { return EvalGameState(game, MAX); } else { best_move <- {}; moves <- GenerateMoves(game); ForEach moves { move <- MinMove(ApplyMove(game), alpha, beta); if (Value(move) > Value(best_move)) { best_move <- move; alpha <- Value(move); // Ignore remaining moves if (alpha >= beta) return best_move; MinMove (GamePosition game, Integer alpha, Integer beta) { if (GameEnded(game) || DepthLimitReached()) { return EvalGameState(game, MIN); } else { best_move <- {}; moves <- GenerateMoves(game); ForEach moves { move <- MaxMove(ApplyMove(game), alpha, beta); if (Value(move) < Value(best_move)) { best_move <- move; beta <- Value(move); // Ignore remaining moves if (beta < = alpha) return best_move; =12 =3  >=  =3 =2  >= 

まとめ ゲームは、自分にとっては最も有利な手を自分が打ち(max)、次に相手が自分にとって最も不利な手を打ち(min)、それらが交互に繰り返されることによって成り立ちます。 <α-β 法(狩り> Minimaxを改良したもの。枝刈りを行うことでMinimaxより評価するノードを抑えている <Minimax algorithmとα-β algorithmの違い> Minimax法ではすべてを探索し最良の手を選択するのに対して、α-β法は、minimax法で採用されないと判断された手については、そこから先を探索しないことで無駄な探索に費やす時間をカットしている。また、α-β法による結果はminimax法での結果と同じになる。 枝刈りを行うことにより探索がminimax法より早く終わるのでα-β法のほうが効率的である。 11

Tic Tac Toe game Applet http://www.tetonsoft.com/nhiro/java/tictactoe.html

Tic Tac Toe game In SymTic.java else { int minValue = 999; Vector v_child = this.children('o'); for (int i=0; i<v_child.size(); i++) { SymTic st = (SymTic)v_child.elementAt(i); int value = st.evaluate(depth-1, MAX, minValue); if (value < minValue) { minValue = value; // minValue =  } if (value <= refValue) { // refValue =   return value; } } return minValue; }} Tic Tac Toe game  In SymTic.java // 評価する. public int evaluate(int depth, int level, int refValue) { int e = evaluateMyself(); if ((depth==0)||(e==99)||(e==-99)||((e==0)&&(Judge.finished(this)))) { return e; } else if (level == MAX) { int maxValue = -999; Vector v_child = this.children(usingChar); for (int i=0; i<v_child.size(); i++) { SymTic st = (SymTic)v_child.elementAt(i); //st is a move int value = st.evaluate(depth, MIN, maxValue); if (value > maxValue ) { maxChild = st; maxValue = value; //maxValue =  } if (value > = refValue) { //refValue =  return value; return maxValue; private int evaluateMyself() { char c = Judge.winner(this); if (c == usingChar) { //win the game return 99; } else if (c != ' ') { //lose the game return -99; } else if (Judge.finished(this)) { //draw the game return 0; }

実行例1 実行例2 Quiz:Take a look of two web sites and make your own example tree. 実行例1 実行例2