Parallel Programming in MPI part 2

Slides:

Advertisements

Similar presentations

Windows HPC 講習会 2009/9/25 Windows HPC コンソーシアム 1 - MS-MPIプログラミング演習 - 同志社大学生命医科学部廣安知之同志社大学工学研究科中尾昌広.

Advertisements

だい六か – クリスマスとお正月ぶんぽう. て form review ► Group 1 Verbs ► Have two or more ひらがな in the verb stem AND ► The final sound of the verb stem is from the い row.

て -form - Making て -form from ます -form -. With て -form, You can say... ～てもいいですか？ (= May I do…) ～てください。 (= Please do…) ～ています。 (= am/is/are doing…) Connecting.

第 5 章 2 次元モデル Chapter 5 2-dimensional model. Contents 1.2 次元モデル 2-dimensional model 2. 弱形式 Weak form 3.FEM 近似 FEM approximation 4. まとめ Summary.

Essay writing rules for Japanese!!. ＊ First ・ There are two directions you can write. ・よこがき / 横書き (same as we write English) ・たてがき / 縦書き (from right to.

11 January 17, Sample answer of the last week's report. (1) 2 #include #include "mpi.h" int main(int argc, char *argv[]) { int i,r, myid, procs,*result;

SS2-15：A Study on Image Recognition and Understanding

クラスタの構成技術とクラスタによる並列処理

英語特別講座　疑問文　＃1　　　英語特別講座　2011 疑問文.

TRIVIA QUIZ Choose a group name! Write this on your answer sheet

All Rights Reserved, Copyright (C) Donovan School of English

英語勉強会.

文法（ぶんぽう）５--Invitation

パワーポイントを使うプレゼンテーションを行う際は、このテンプレートを参考にしてください。

第１回レポートの課題６月１５日出題今回の課題は１問のみ第２回レポートと併せて本科目の単位を認定第２回は７月に出題予定

Chapter 11 Queues 行列.

日本語... ジェパディー！ This is a template for you to use in your classroom.

AP/5 ２０１３年２月7日.

Bellwork: English meaning? １）はじめまして２）どうぞ３）すみません４）おはようございます５）しゅくだい

クラスタコンピューティングの並列環境と性能

今しましょう Translate the story on the next slide. せんせいはしゅくだいをみます。

Chris Burgess (1号館1308研究室、内線164)

What did you do, mate? Plain-Past

Training on Planning & Setting Goals

日本人の英語文章の中で「ENJOY」はどういうふうに使われているのか

Noun の間(に) + Adjective Verb てform + いる間(に) during/while.

OSI7層の各層の1)名称 2)機能の簡単な説明 3)各階層に関連のある機器、規格などを5つ以上書いて下さい。

Tohoku University Kyo Tsukada

にほんご JPN101 Sep. 23, 2009 (Wednesday).

十年生の日本語 Year 10 Writing Portfolio

Licensing information

Chapter 4 Quiz #2 Verbs Particles を、に、で

Who Is Ready to Survive the Next Big Earthquake?

Did he/she just say that? Get your head out of the gutter! Oh wait….

“You Should Go To Kyoto”

Nihongo Japanese 日本ご ‘Numbers ’ & ‘Hiragana Revision’

What is the English Lounge?

ストップウォッチのカードストップウォッチのカード

プログラミング演習バージョン１担当教員：綴木　馴.

Parallel Programming in MPI part 1

情報の科学的な理解（2）情報科教育法　8回目 2005/6/4 太田　剛.

Session 8: How can you present your research?

Parallel Programming in MPI part 3

プロセス間データ通信　齋藤グループ小林直樹

Parallel Programming in MPI part 2

-Get test signed and make corrections

Term paper, Report （1st, first）

Where is Wumpus Propositional logic (cont…) Reasoning where is wumpus

Parallel Programming in MPI part 1

MPIを使った加算　齋藤グループ小林直樹

クイズやゲーム形式で紹介した実例です。いずれも過去のインターン作です。

いくらですか？.

プログラムの制御構造配列・繰り返し.

22 物理パラメータに陽に依存する補償器を用いた低剛性二慣性系の速度制御実験高山誠指導教員小林泰秀

2019/4/22 Warm-up ※Warm-up 1～3には、小学校外国語活動「アルファベットを探そう」（H26年度、神埼小学校におけるSTの授業実践）で、５年生が撮影した写真を使用しています（授業者より使用許諾済）。

Term paper, report (2nd, final）

Genetic Statistics Lectures （４） Evaluation of a region with SNPs

知能ソフトウェア特論 Intelligent Software

知能ソフトウェア特論 Intelligent Software

千代浩司高エネルギー加速器研究機構素粒子原子核研究所

千代浩司高エネルギー加速器研究機構素粒子原子核研究所

ー生命倫理の授業を通して生徒の意識に何が生じたかー

The difference between adjectives and adverbs

Created by L. Whittingham

Cluster EG Face To Face meeting

Parallel Programming in MPI part 2

Grammar Point 2: Describing the locations of objects

米国政府との取引について Doing Business With the U.S. Government

Improving Strategic Play in Shogi by Using Move Sequence Trees

千代浩司高エネルギー加速器研究機構素粒子原子核研究所

Presentation transcript:

Parallel Programming in MPI part 2 1

Answer of the previous exercise 作成したプログラムについて、以下を説明してください。 Explain your program with the following points of view. どのように解決したか。 How did you solve the problem. 何が難しかったか。 Where was the difficulty.

Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う　Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications MPIプログラムの時間計測 Measuring execution time of MPI programs デッドロック Deadlock 　

Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う　Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications MPIプログラムの時間計測 Measuring execution time of MPI programs デッドロック Deadlock 　

ノンブロッキング通信関数 Non-blocking communication functions ノンブロッキング = ある命令の完了を待たずに次の命令に移る Non-blocking = Do not wait for the completion of an instruction and proceed to the next instruction Example) MPI_Irecv & MPI_Wait Blocking Non-Blocking MPI_Recv Proceed to the next instruction without waiting for the data MPI_Irecv next instructions Wait for the arrival of data data data MPI_Wait next instructions

MPI_Irecv Non-Blocking Receive request: 通信要求 Communication Request Usage: int MPI_Irecv(void *b, int c, MPI_Datatype d, int src, int t, MPI_Comm comm, MPI_Request *r); Non-Blocking Receive Parameters: start address for storing received data, number of elements, data type, rank of the source, tag (= 0, in most cases), communicator (= MPI_COMM_WORLD, in most cases), request request: 通信要求 Communication Request この通信の完了を待つ際に用いる Used for Waiting completion of this communication Example) MPI_Request req; ... MPI_Irecv(a, 100, MPI_INT, 0, 0, MPI_COMM_WORLD, &req); ... MPI_Wait(&req, &status); 6 6

MPI_Isend Non-Blocking Send Usage: int MPI_Isend(void *b, int c, MPI_Datatype d, 　　　　　　　　　　　　　int dest, int t, MPI_Comm comm, MPI_Request *r); Non-Blocking Send Parameters: start address for sending data, number of elements, data type, rank of the destination, tag (= 0, in most cases), communicator (= MPI_COMM_WORLD, in most cases), request Example) MPI_Request req; ... MPI_Isend(a, 100, MPI_INT, 1, 0, MPI_COMM_WORLD, &req); ... MPI_Wait(&req, &status); 7 7

Non-Blocking Send? Blocking send (MPI_Send): 送信データが別の場所にコピーされるのを待つ Wait for the data to be copied to somewhere else. ネットワークにデータを送出し終わるか、一時的にデータのコピーを作成するまで。 Until completion of the data to be transferred to the network or, until completion of the data to be copied to a temporal memory. Non-Blocking send (MPI_Recv): 待たない　

Value of A at here can be 10 or 50 Notice: ノンブロッキング通信中はデータが不定　Data is not sure in non-blocking communications MPI_Irecv: 受信データの格納場所と指定した変数の値は MPI_Waitまで不定 Value of the variable specified for receiving data is not fixed before MPI_Wait A arrived data MPI_Irecv to A 10 ... ~ = A A 50 Value of A at here can be 10 or 50 50 MPI_Wait Value of A is 50 ~ = A

Notice: ノンブロッキング通信中はデータが不定　Data is not sure in non-blocking communications MPI_Isend: 送信データを格納した変数を MPI_Waitより前に書き換えると、実際に送信される値は不定 If the variable that stored the data to be sent is modified before MPI_Wait, the value to be actually sent is unpredictable. A MPI_Isend A Modifying value of A here causes incorrect communication 10 ... A = 50 data sent A 10 or 50 50 MPI_Wait You can modify value of A at here without any problem A = 100

MPI_Wait Usage: int MPI_Wait(MPI_Request *req, MPI_Status *stat); ノンブロッキング通信（MPI_Isend、 MPI_Irecv）の完了を待つ。 Wait for the completion of MPI_Isend or MPI_Irecv 送信データの書き換えや受信データの参照が行える Make sure that sending data can be modified, or receiving data can be referred. Parameters: request, status status: MPI_Irecv 完了時に受信データの statusを格納 The status of the received data is stored at the completion of MPI_Irecv

MPI_Waitall Usage: int MPI_Waitall(int c, MPI_Request *requests, MPI_Status *statuses); 指定した数のノンブロッキング通信の完了を待つ Wait for the completion of specified number of non-blocking communications Parameters: count, requests, statuses count: ノンブロッキング通信の数 The number of non-blocking communications requests, statuses: 少なくとも count個の要素を持つ MPI_Request と MPI_Statusの配列 Arrays of MPI_Request or MPI_Status that consists at least 'count' number of elements.

Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う　Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications MPIプログラムの時間計測 Measuring execution time of MPI programs デッドロック Deadlock 　

集団通信関数の中身 Inside of the functions of collective communications 通常，集団通信関数は，　 MPI_Send, MPI_Recv, MPI_Isend, MPI_Irecv 等の一対一通信で実装される Usually, functions of collective communications are implemented by using message passing functions.

Inside of MPI_Bcast One of the most simple implementations int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm) { int i, myid, procs; MPI_Status st; MPI_Comm_rank(comm, &myid); MPI_Comm_size(comm, &procs); if (myid == root){ for (i = 0; i < procs) if (i != root) MPI_Send(a, c, d, i, 0, comm); } else{ MPI_Recv(a, c, d, root, 0, comm, &st); } return 0; }

Another implementation: With MPI_Isend int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm) { int i, myid, procs, cntr; MPI_Status st, *stats; MPI_Request *reqs; MPI_Comm_rank(comm, &myid); MPI_Comm_rank(comm, &procs); if (myid == root){ stats = (MPI_Status *)malloc(sizeof(MPI_Status)*procs); reqs = (MPI_Request *)malloc(sizeof(MPI_Request)*procs); cntr = 0; for (i = 0; i < procs) if (i != root) MPI_Isend(a, c, d, i, 0, comm, &(reqs[cntr++])); MPI_Waitall(procs-1, reqs, stats); free(stats); free(reqs); } else{ MPI_Recv(a, c, d, root, 0, comm, &st); } return 0; }

Flow of the Simple Implementation Rank 0 Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Rank 7 Isend to 1 Irecv from 0 Irecv from 0 Irecv from 0 Irecv from 0 Irecv from 0 Irecv from 0 Irecv from 0 Isend to 2 Isend to 3 wait wait wait wait wait wait wait Isend to 4 Isend to 5 Isend to 6 Isend to 7 waitall

Time for Simple Implementation 1 link can transfer 1 message at a time 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Total Time = T * (P-1) T: Time for transferring 1 message P: Number of processes 1 2 3 4 5 6 7

Another implementation: Binomial Tree int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm) { int i, myid, procs; MPI_Status st; int mask, relative_rank, src, dst; int tag = 1, success = 0; MPI_Comm_rank(comm, &myid); MPI_Comm_rank(comm, &procs); relative_rank = myid - root; if (relative_rank < 0) relative_rank += procs; mask = 1; while (mask < num_procs){ if (relative_rank & mask){ src = myid - mask; if (src < 0) src += procs; MPI_Recv(a, c, d, src, 0, comm, &st); break; } mask <<= 1; mask >>= 1; while (mask > 0){ if (relative_rank + mask < procs){ dst = myid + mask; if (dst >= procs) dst -= procs; MPI_Send (a, c, d, dst, 0, comm); } return 0;

Flow of Binomial Tree Use 'mask' to determine when and how to Send/Recv Rank 0 Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Rank 7 mask = 1 mask = 1 mask = 1 mask = 1 mask = 1 mask = 1 mask = 1 mask = 1 mask = 2 mask = 2 mask = 2 mask = 2 Recv from 6 Recv from 0 Recv from 2 Recv from 4 mask = 4 mask = 4 Recv from 0 Recv from 4 Recv from 0 mask = 4 Send to 4 mask = 2 Send to 6 mask = 2 mask = 1 Send to 2 mask = 1 Send to 5 Send to 7 mask = 1 Send to 3 mask = 1 Send to 1

Time for Binomial Tree Use multiple links at a time 1 2 3 4 5 6 7 1 2 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Total Time = T * log2P T: Time for transferring 1 message P: Number of processes 1 2 3 4 5 6 7

Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う　Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications MPIプログラムの時間計測 Measuring execution time of MPI programs デッドロック Deadlock 　

MPIプログラムの時間計測 Measure the time of MPI programs MPI_Wtime 現在時間（秒）を実数で返す関数 Returns the current time in seconds. Example) ... double t1, t2; ... t1 = MPI_Wtime(); 　　処理 t2 = MPI_Wtime(); printf("Elapsed time: %e sec.\n", t2 – t1); Measure time here

並列プログラムにおける時間計測の問題 Problem on measuring time in parallel programs プロセス毎に違う時間を測定：　どの時間が本当の所要時間か? Each process measures different time. Which time is the time we want? Rank 0 t1 = MPI_Wtime(); Rank 1 Read Rank 2 t1 = MPI_Wtime(); Measure time here Read t1 = MPI_Wtime(); Receive Send Receive Read t1 = MPI_Wtime(); Send t1 = MPI_Wtime(); t1 = MPI_Wtime(); 24

集団通信 MPI_Barrierを使った解決策 Use MPI_Barrier 時間計測前にMPI_Barrierで同期 Synchronize processes before each measurement For measuring total execution time. Rank 0 MPI_Barrier Rank 1 MPI_Barrier Rank 2 MPI_Barrier t1 = MPI_Wtime(); Receive Receive Read Measure time here Read Send Read MPI_Barrier Send MPI_Barrier MPI_Barrier t1 = MPI_Wtime(); 25

より細かい解析 Detailed analysis Average MPI_Reduce can be used to achieve the average: MAX and MIN Use MPI_Gather to gather all of the results to Rank 0. Let Rank 0 to find MAX and MIN double t1, t2, t, total; t1 = MPI_Wtime(); 　　 ... t2 = MPI_Wtime(); t = t2 – t1; MPI_Reduce(&t, &total, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (myrank == 0) printf("Ave. elapsed: %e sec.\n", total/procs);

最大(Max)、平均(Ave)、最小(Min)の関係 Relationships among Max, Ave and Min プロセス毎の負荷（仕事量）のばらつき検証に利用 Can be used for checking the load-balance. Max – Ave is large Max – Ave is small Ave – Min is large NG Mostly OK Ave – Min is small OK Time includes Computation Time and Communication Time

通信時間の計測 Measuring time for communications double t1, t2, t3, t4 comm=0; t3 = MPI_Wtime(); for (i = 0; i < N; i++){ computation t1 = MPI_Wtime(); communication t2 = MPI_Wtime(); comm += t2 – t1; computation t1 = MPI_Wtime(); communication } t4 = MPI_Wtime();

Analyze computation time Computation time = Total time - Communication time Or, just measure the computation time 計算時間のばらつき　＝　負荷の不均衡の度合い Balance of computation time shows balance of the amount of computation 注意:　通信時間には、負荷の不均衡によって生じた待ち時間が含まれるので、単純な評価は難しい Communication time is difficult to analyze since it consists waiting time caused by load-imbalance. ==> Balance computation first.

Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う　Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications MPIプログラムの時間計測 Measuring execution time of MPI programs デッドロック Deadlock 　

Deadlock 何らかの理由で、プログラムを進行させることができなくなった状態 A status of a program in which it cannot proceed by some reasons. MPIプログラムでデッドロックが発生しやすい場所： Places you need to be careful for deadlocks: 1. MPI_Recv, MPI_Wait, MPI_Waitall 　　　　　 2. Collective communications 　全部のプロセスが同じ集団通信関数を実行するまで先に進めない A program cannot proceed until all processes call 　 the same collective communication function Wrong case: One solution: use MPI_Irecv if (myid == 0){ MPI_Recv from rank 1 MPI_Send to rank 1 } if (myid == 1){ MPI_Recv from rank 0 MPI_Send to rank 0 } if (myid == 0){ MPI_Irecv from rank 1 MPI_Send to rank 1 MPI_Wait } if (myid == 1){ MPI_Irecv from rank 0 MPI_Send to rank 0 MPI_Wait }

Summary ノンブロッキング通信の効果 Effect of non-blocking communication 通信開始と通信完了待ちを分離 Split the start and the completion of a communication 通信と計算のオーバラップを可能にする Enable overlapping of communication and computation . 集団通信の実装 Implementation of collective communication. 内部で送信と受信を組み合わせて実装 Construct algorithms with sends and receives. 所要時間はアルゴリズムに依存 Time depends on the algorithm. MPIプログラムの時間計測 Measuring execution time of MPI programs 並列プログラムではデッドロックに注意 Be careful about deadlocks.

Report) Make Reduce function by yourself 次のページのプログラムの my_reduce関数の中身を追加してプログラムを完成させる Fill the inside of 'my_reduce' function in the program shown in the next slide my_reduce: MPI_Reduceの簡略版 Simplified version of MPI_Reduce 整数の総和のみ. ルートランクは 0限定．コミュニケータは MPI_COMM_WORLD Calculates total sum of integer numbers. The root rank is always 0. The communicator is always MPI_COMM_WORLD. アルゴリズムは好きなものを考えてよい Any algorithm is OK.

complete here by yourself #include <stdio.h> #include <stdlib.h> #include "mpi.h" #define N 20 int my_reduce(int *a, int *b, int c) { return 0; } int main(int argc, char *argv[]) int i, myid, procs; int a[N], b[N]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &procs); for (i = 0; i < N; i++){ a[i] = i; b[i] = 0; my_reduce(a, b, N); if (myid == 0) for (i = 0; i < N; i++) printf("b[%d] = %d , correct answer = %d\n", i, b[i], i*procs); MPI_Finalize(); complete here by yourself