廣安知之同志社大学工学部知識工学科 tomo@is.doshisha.ac.jp PCクラスタを作ろう！！廣安　知之同志社大学　工学部　知識工学科 tomo@is.doshisha.ac.jp.

廣安知之同志社大学工学部知識工学科 tomo@is.doshisha.ac.jp
PCクラスタを作ろう！！廣安　知之同志社大学　工学部　知識工学科

Cluster clus·ter n. Ⅰ 〔ブドウ・サクランボ・フジの花などの〕房（ふさ）
〔of〕a cluster of grapes 一房のブドウ. Ⅱ 〔同種類のもの・人の〕群れ, 集団〔of〕a cluster of spectators 一団の観客. a cluster of butterflies チョウの群れ. a cluster of stars 星団. in a cluster （一つに）かたまって, 群れをなして. in clusters いくつもの群れをなして, かたまりになって. New College English-Japanese Dictionary, 6th edition (C) Kenkyusha Ltd. 1967,1994,1998

PCクラスタ並列計算機

このチュートリアル講座では．．． PCクラスタという並列計算機計算機の作り方並列計算の仕方ハードウエアソフトウエア
一般的なプログラムの作成進化的計算（GAなど）の並列モデル

導入が極めて簡単（大規模･高度なシステムは難しい）
でも本当は教えたくない！？導入が極めて簡単（大規模･高度なシステムは難しい）並列処理の研究には明日はないかもしれないが，経済問題，バイオインフォマティクスとならんで，並列アプリケーションには夢がある．（閉塞を打破しよう．．．情報処理学会誌　2001.7,8)

並列アプリケーションには明日がある！？高速に処理が可能創発アルゴリズム（特に進化的計算アルゴリズム）は確実に高速化が可能計算パラダイムが変わる？（P2Pなど）創発アルゴリズム（特に進化的計算アルゴリズム）には秘策がある

はずかしい並列（embarrassing parallel)
パラメトリックサーチ

創発アルゴリズム（特に進化的計算アルゴリズム）の秘策
ある探索アルゴリズムで25％の確率で解が発見できる場合 P 初期解のbroadcast 解の収集･ベスト解の選択並列アルゴリズムによって100%の確率で解が発見可能！！

何故，並列処理を行わなければならないか？

何故並列処理を行わなければならないのか？
高速に処理をおこなわなければならない計算コストが高い繰り返しを多く必要とする大規模な問題

進化的計算手法 Features (HA) 生物の遺伝と進化様々な問題に比較的簡単に適用可能計算負荷が高い個体数が多い
１つの仕事はサブタスクに分割可能 High Performance Computing (HA)

特に進化的計算の場合は威力を発揮（後で詳しく説明）

ex. Project rc5 Intel Philanthoropic P2P Program 計算パラダイムの変化計算資源は余っている．研究室のパソコン実習室･演習室のパソコン景気刺激対策で導入された大型計算機その他グローバルコンピューティングインターネット接続による大規模システム P2P NapsterやGnutellaを代表とする分散システム

並列処理の必要性はわかったが何故PCクラスタなのか？

Top５００ http://www.top500.org Parallel Computers Ranking Name # Proc 1
Rmax (Gflops) 1 ASCI White 8192 7226 2 SP Power3 2528 2526 3 ASCI Red 9632 2379 4 ASCI Blue-Pacific 5808 2144 5 SR8000/MPP 1152 1709 Parallel Computers

CPU Pentium Alpha Power etc.
Commodity Hardware CPU Pentium Alpha Power etc. Networking Internet Lan Wan Gigabit cable less etc. PCs + Networking PC Clusters

なぜPCクラスターなのか? hardware Commodity Off-the-shelf Software Open source
Free ware Peopleware 大学の学生，助手，講師研究所のおたく高い性能低コスト簡単なセットアップ簡単な利用占有可能

Top５００ http://www.top500.org Ranking Name # Proc 36 SCore III 1024
Rmax (Gflops) 36 SCore III 1024 547.9 42 CPlant Cluster 512.4 1000 102 LosLobos 512 237 156 CLIC PIII 528 143.3 389 ABACUS Cluster 520 80.8 439 Presto III 104 77.4

並列計算機の分類

簡単なPCクラスタを作ろう！！

PCクラスタ 8nodes + gateway(file server) Fast Ethernet Switching Hub 150万

Hardware CPU memory motherboard hard disc case network card cable hub
何を準備すれば良いのか? Hardware CPU memory motherboard hard disc case network card cable hub Normal PCs

Software OS tools Editor Compiler Parallel Library
何を準備すれば良いのか? Software OS tools Editor Compiler Parallel Library

メッセージパッシング

ソケット「接続の端点」意味：コンピュータとTCP/IPを　　　　　つなぐ出入り口ソケット TCP/IP

ソケット通信ソケットを使って通信を行うには２つのプログラムが必要クライアントプログラムサーバプログラムソケットを用意して
サーバに接続要求を行うサーバプログラムソケットを用意して接続要求を待つ

ソケット通信とは接続待ち接続待ちのサーバをクライアントが探すサーバを探すサーバ側クライアント側

ソケット通信とは接続を受信サーバが見つかったら接続して通信サーバを見つけて接続サーバ側クライアント側

メッセージパッシングライブラリ PVM (Parallel Virtual Machine)
PVM was developed at Oak Ridge National Laboratory and the University of Tennessee. MPI (Message Passing Interface) MPI is an API of message passing. 1992: MPI forum 1994 MPI 1 1887 MPI 2

MPIの実装 Free Implementation Bender Implementation MPICH ： LAM：
WMPI ： Windows 95，NT CHIMP/MPI MPI Light Bender Implementation Implementations of parallel computers MPI/PRO ：

クラスタの構築の手順複数台のPCを用意する PCをつなげる OSとtoolをインストールする開発toolと並列ライブラリをインストール

Linux GNU Compiler, GDB rsh
OS/tools Linux GNU Compiler, GDB rsh

# rpm –ivh lam-6.3.3b28-1.i386.rpm # rpm –ivh mpich-1.2.0-5.i386.rpm
MPICH/LAMのインストール # rpm –ivh lam-6.3.3b28-1.i386.rpm # rpm –ivh mpich i386.rpm # dpkg –i lam2_ deb # dpkg –i mpich_ deb # apt-get install lam2 # apt-get install mpich

ソケット通信の流れ 1 ソケット生成 (socket) ソケット生成 (socket) クライアントサーバ

ソケット通信の流れ 2 サーバを探す (gethostbyname) 接続の準備 (bind/listen) クライアントサーバ

ソケット通信の流れ 3 接続要求 (connect) 接続受理 (accept) OK! クライアントサーバ

ソケット通信の流れ 4 通信 (send/recv) 通信 (send/recv) こんにちわこんにちわクライアントサーバ

ソケット通信の全体の流れクライアントサーバソケット生成(socket) 接続要求(connect) サーバを探す
(gethostbyname) データ送受信(send/recv) ソケットを閉じる(close) ソケット生成(socket) 接続の準備(bind) 接続待機(listen) 識別情報接続受信(accept) データ送受信(send/recv) ソケットを閉じる(close)

クライアントプログラム（C） //client.c #include<stdio.h>
#include<sys/types.h> #include<sys/socket.h> #include<netinet/in.h> #include<netdb.h> #include<string.h> #define PORT (u_short)10000 #define BUF_LEN 100 char hostname[]="localhost"; char buf[BUF_LEN]; main() { struct hostent *servhost; struct sockaddr_in server; int s; servhost = gethostbyname(hostname); bzero((char *)&server,sizeof(server)); server.sin_family = AF_INET; server.sin_port = PORT; bcopy(servhost->h_addr, (char *)&server.sin_addr,servhost->h_length); s = socket(AF_INET,SOCK_STREAM,0); connect(s,(void *)&server,sizeof(server)); read(s,buf,BUF_LEN); printf(buf); close(s); }

サーバプログラム（C） //server.c #include<stdio.h>
#include<sys/types.h> #include<sys/socket.h> #include<netinet/in.h> #include<netdb.h> #include<string.h> #define PORT (u_short)10000 char hostname[] = "localhost"; main() { struct hostent *myhost; struct sockaddr_in me; int s_waiting, s; char msg[] = "Hello World!!\n"; myhost = gethostbyname(hostname); bzero((char *)&me, sizeof(me)); me.sin_family = AF_INET; me.sin_port = PORT; bcopy(myhost->h_addr, (char *)&me.sin_addr,myhost->h_length); s_waiting = socket(AF_INET,SOCK_STREAM,0); bind(s_waiting,(void *)&me,sizeof(me)); listen(s_waiting, 1); s = accept(s_waiting, NULL, NULL); close(s_waiting); write(s, msg, strlen(msg)); close(s); }

クライアントプログラム（Java） import java.io.*; import java.net.*;
import java.lang.*; public class Client{ public static void main( String[] args ){ try{ //ソケットを作成 String host="localhost"; Socket socket = new Socket( host, ); //入力ストリームを作成 DataInputStream is = new DataInputStream ( new BufferedInputStream( socket.getInputStream())); //サーバ側から送信された文字列を受信 byte[] buff = new byte[1024]; int a = is.read(buff); System.out.write(buff, 0, a); //ストリーム，ソケットをクローズ is.close(); socket.close(); }catch(Exception e){ System.out.println(e.getMessage()); e.printStackTrace(); }

サーバプログラム（Java） //Server.java //ストリーム，ソケットをクローズ os.close();
import java.net.*; import java.lang.*; import java.io.*; public class Server{ public static void main( String[] args ){ try{ //ソケットを作成 ServerSocket svSocket = new ServerSocket(10000); //クライアントからのコネクション要求受付 Socket cliSocket = svSocket.accept(); //出力ストリームを作成 DataOutputStream os = new DataOutputStream( new BufferedOutputStream( cliSocket.getOutputStream())); //文字列を送信 String s = new String("Hello World!!\n"); byte[] b = s.getBytes(); os.write(b, 0, s.length()); //ストリーム，ソケットをクローズ os.close(); cliSocket.close(); svSocket.close(); }catch( Exception e ){ System.out.println(e.getMessage()); e.printStackTrace(); }

並列計算の枠組み (MPI) gateway Jobs Tasks user PC-Cluster
Massive parallel computer gateway Jobs Tasks user PC-Cluster

並列計算の分類

MPIの枠組み Communicator # include “mpi.h”
int main( int argc, char **argv ) { MPI_Init(&argc, &argv ) ; MPI_Comm_size( …… )； MPI_Comm_rank( …… ) ; /* parallel procedure */ MPI_Finalize( ) ; return 0 ; } Initialization Communicator Acquiring number of process Acquiring rank Termination

通信方法１対１通信グループ通信 Receive/send data Receive/send data Process A
Process B Receive/send data Receive/send data

１対１通信 [Sending] MPI_Send( void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) void *buf：Sending buffer starting address (IN) int count：Number of Data (IN) MPI_ Datatype datatype：data type (IN) int dest：receiving point (IN) int tag：message tag (IN) MPI_Comm comm：communicator(IN)

１対１通信 [Receiving] MPI_Recv( void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status status) void *buf：Receiving buffer starting address (OUT) int source：sending point (IN) int tag：Message tag (IN) MPI_Status *status：Status (OUT)

~Hello.c~ #include <stdio.h> #include "mpi.h"
void main(int argc,char *argv[]) { int myid,procs,src,dest,tag=1000,count; char inmsg[10],outmsg[]="hello"; MPI_Status stat; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&myid); count=sizeof(outmsg)/sizeof(char); if(myid == 0){ src = 1; dest = 1; MPI_Send(&outmsg,count,MPI_CHAR,dest,tag,MPI_COMM_WORLD); MPI_Recv(&inmsg,count,MPI_CHAR,src,tag,MPI_COMM_WORLD,&stat); printf("%s from rank %d\n",&inmsg,src); }else{ src = 0; dest = 0; } MPI_Finalize();

１対１通信 MPI_Recv(&inmsg,count,MPI_CHAR,src, tag,MPI_COMM_WORLD,&stat);
MPI_Send(&outmsg,count,MPI_CHAR,dest, tag,MPI_COMM_WORLD); MPI_Sendrecv(&outmsg,count,MPI_CHAR,dest, tag,&inmsg,count,MPI_CHAR,src, tag,MPI_COMM_WORLD,&stat);

π計算 -Parallel conversion- 部分に分割分割部分を各プロセッサに割り当て結果を収集 0.5 1 1.5 2 2.5
0.5 1 1.5 2 2.5 3 3.5 4 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 x y 部分に分割分割部分を各プロセッサに割り当て結果を収集

グループ通信 Broadcast MPI_Bcast( void *buf, int count, MPI_Datatype datatype, int root, MPI_Comm comm ) Rank of sending point Data

グループ通信 Communication and operation (reduce) MPI_Reduce( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm ) Operation handle Rank of receiving point MPI_SUM, MPI_MAX, MPI_MIN, MPI_PROD Operation

π計算の流れ

PCクラスタの高性能化

Intel Pentium III, IV AMD Athlon Transmeta Crusoe
CPU Hardware Intel Pentium III, IV AMD Athlon Transmeta Crusoe

Gigabit Wake On LAN ネットワークの２重化
Network Hardware Ethernet Gigabit Ethernet Myrinet QsNet Giganet SCI Atoll VIA Infinband Gigabit Wake On LAN ネットワークの２重化

Myrinet Myricom社が開発 PCクラスタコンピューティングのデファクト・スタンダードとして期待
EthernetやATMなどより優れた性能，コストパフォーマンスを発揮

各ネットワークの性能比較（１）

各ネットワークの性能比較（２）

従来の通信送信側受信側ユーザアドレス空間ユーザアドレス空間データデータコピーコピーカーネルアドレス空間カーネル
NIC NIC データデータ

ゼロコピー通信送信側受信側ユーザアドレス空間ユーザアドレス空間データデータカーネルアドレス空間カーネルアドレス空間
NIC NIC データデータ

SCSI IDE Raid Diskless Cluster
ハードディスク Hardware SCSI IDE Raid Diskless Cluster ジャーナリングファイルシステム

ケース Hardware Box inexpensive Rack compact maintenance

ソフトウエア Software

OS Linux Kernels Open source network Free ware Features
The /proc file system Loadable kernel modules Virtual consoles Package management

OS Linux Kernels Linux Distributions Red Hat www.redhat.com
Linux Distributions Red Hat Debian GNU/Linux S.u.S.E. Slackware

管理ツール NFS（Network File System) NIS (Network Information System)
NTP (Network Time Protocol) server client

リソースマネージメントとスケジューリングツール
プロセスの分配ロードバランス複数タスクのジョブ管理 CONDOR DQS LSF The Sun Grid Engine

Editor Emacs Language C, C++, Fortran, Java Compiler
プログラミング開発ツール Editor Emacs Language C, C++, Fortran, Java Compiler GNU NAG PGI VAST Absoft Fujitsu Intel products/compilers/index.htm

プログラミング開発ツール Make CVS Debugger Gdb Total View

MPIの実装 mpich Lam http://www-unix.mcs.anl.gov/mpi/index.html
Easy to use High portability for UNIX, NT/Win, Globus Lam High availability

MPICH VS LAM （SMP) DGA Gcc(2.95.3), mpicc -O2 –funroll - loops # node
32 ,2 Processor type Pentium III 700MHz Memory 128 Mbytes OS Linux Network Fast Ethernet，TCP/IP Switching HUB DGA Gcc(2.95.3), mpicc -O2 –funroll - loops

MPICH VS LAM (# process)
# node 8 processor PentiumⅢ 850MHz memory 256 Mbytes OS Linux Network Fast Ethernet，TCP/IP Switching HUB DGA Gcc(2.95.3), mpicc -O2 –funroll - loops

プロファイラー MPE (MPICH) Paradyn http://www.cs.wisc.edu/paradyn/ Vampier

Win用のメッセージパッシングライブラリ
PVM PVM3.4 WPVM MPI mpich WMPI(Critical Software) MPICH/NT (Mississippi State Univ.) MPI Pro (MPI Software Technology)

クラスタのディストリビューション FAI http://www.informatik.uni-koeln.de/fai/
Alinka Mosix Bproc Scyld Score Kondara HPC

Math Library PhiPac from Berkeley FFTW from MIT www.fftw.org Atlas
Automatic Tuned Linear Algebra software ATLAS is an adaptive software architecture and faster than all other portable BLAS implementations and it is comparable with machine specific libraries provided by the vender.

Math Library PETSc PETSc is a large stuite of data structures and routines for both uni and parallel processor scientific computing.

遺伝的アルゴリズム

GAsの並列モデル Master Slave (Micro grained ) Cellular (Fine grained)
Distributed GAs (Island, Coarse grained)

Amdahl’s low r: ratio of parallelization

Master Slave model Master node crossover mutation evaluation selection
evaluate evaluate evaluate client client client client client client client client client a) delivers each individual to slave b) returns the value as soon as finishes calculation c) sends non-evaluated individual from master

Cellular GAs

Distributed Genetic Algorithms (Island GAs)
subpopulation migration

Distributed Genetic Algorithms
island 1 island n island 0 GA GA GA migration Generation GA GA GA Migration interval migration = migration rate

Cambria Visual Technology # node 256 # CPUs: 256 CPU: Pentium III 0.8GHz Memory:32GB(128MB × 256) Hard Disc: Network: FastEithernet Hub:

DGAの並列化効率

DGAsの探索性能

教科書，Web sites，その他

教科書 “Building Linux Clusters” “How to Build Beowulf”
“High Performance Cluster Computing”

教科書 MPI並列プログラミング虎の巻きシリーズ

Web sites IEEE Computer Society Task Force on Cluster Computing
White Paper Cluster top 500 Beowulf Project Beowulf Under Ground

IEEE TFCC 超並列計算研究会 SOFTEK HPC メイリングリスト Debian Beowulf
学会その他 IEEE　TFCC 超並列計算研究会 SOFTEK HPC　メイリングリスト Debian Beowulf

クラスタのコンセプトクラスタの作り方並列GA
このチュートリアルでは…. クラスタのコンセプトクラスタの作り方並列GA

廣安知之同志社大学工学部知識工学科 tomo@is.doshisha.ac.jp PCクラスタを作ろう！！廣安　知之同志社大学　工学部　知識工学科 tomo@is.doshisha.ac.jp.

Similar presentations

Presentation on theme: "廣安知之同志社大学工学部知識工学科 tomo@is.doshisha.ac.jp PCクラスタを作ろう！！廣安　知之同志社大学　工学部　知識工学科 tomo@is.doshisha.ac.jp."— Presentation transcript:

Similar presentations

About project

フィードバック

ログインする

Auth with social network:

廣安 知之 同志社大学 工学部 知識工学科 tomo@is.doshisha.ac.jp PCクラスタを作ろう！！ 廣安 知之 同志社大学 工学部 知識工学科 tomo@is.doshisha.ac.jp.

Similar presentations

Presentation on theme: "廣安 知之 同志社大学 工学部 知識工学科 tomo@is.doshisha.ac.jp PCクラスタを作ろう！！ 廣安 知之 同志社大学 工学部 知識工学科 tomo@is.doshisha.ac.jp."— Presentation transcript:

Similar presentations

About project

フィードバック

廣安知之同志社大学工学部知識工学科 tomo@is.doshisha.ac.jp PCクラスタを作ろう！！廣安　知之同志社大学　工学部　知識工学科 tomo@is.doshisha.ac.jp.

Presentation on theme: "廣安知之同志社大学工学部知識工学科 tomo@is.doshisha.ac.jp PCクラスタを作ろう！！廣安　知之同志社大学　工学部　知識工学科 tomo@is.doshisha.ac.jp."— Presentation transcript: