Virtualizing a SMP on loosely-coupled computers

Slides:

Advertisements

Similar presentations

Windows Azure ハンズオントレーニング Windows Azure Web サイト入門.

Advertisements

Virtual Editionのご紹介 2012年12月12日.

第3回並列計算機のアーキテクチャと並列処理の実際

情報理工学系研究科コンピュータ科学専攻上嶋裕樹

メモリコンシステンシモデル memory consistency model

榮樂英樹 LilyVM と仮想化技術榮樂英樹

クラウドにおけるネストした仮想化を用いた安全な帯域外リモート管理

Virtual Editionのご紹介（株）ネットジャパン　法人営業部 2012年7月18日 1.

マルチコア時代の並列プログラミング ~ロックとメモリオーダリング~

IaaS 仮想マシン(VM)をネットワーク経由で提供負荷に応じてVM数や性能を変更できるハードウェアの導入・管理・維持コストの削減

データベース工学データベースとはデータモデル関係データベースとＳＱＬ物理データベース編成とインデクス

Virtualizing a Multiprocessor Machine on a Network of Computers

仮想マシンの並列処理性能に対するCPU割り当ての影響の評価

ファイルシステムキャッシュを考慮した仮想マシン監視機構

Microsoft Tech∙Ed Japan 2010

Disco: Running Commodity Operating Systems on Scalable Multiprocessors

第3回 CPUの管理と例外処理 OSによるハードウェアの管理 CPUの構成、動作 CPUの管理例外処理、割り込み処理コンテキストスイッチ

ネストした仮想化を用いた VMの安全な帯域外リモート管理

Tohoku University Kyo Tsukada

Windows Summit /8/2017 © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be.

帯域外リモート管理の継続を実現可能なVMマイグレーション手法

Ibaraki Univ. Dept of Electrical & Electronic Eng.

SAP & SQL Server テクニカルアーキテクチャ概要マイクロソフト株式会社 SAP/Microsoft コンピテンスセンター

Provisioning on Multiple Network(NIC) env

単一システムイメージを提供するための仮想マシンモニタ

GXP・Phoenixによる大規模並列情報処理

サスペンドした仮想マシンのオフラインアップデート

Virtual Editionのご紹介 2012年7月26日.

型付きアセンブリ言語を用いた安全なカーネル拡張

Lazy Release Consistency

AMD64の仮想化技術を利用した仮想マシンモニタの実装

全体ミーティング金田憲二.

“Survey of System Virtualization Techniques” by Robert Rose のまとめ

分散IDSの実行環境の分離による安全性の向上

VMのメモリ暗号化によるクラウド管理者への情報漏洩の防止

VMのメモリ暗号化によるクラウド管理者への情報漏洩の防止

コンピュータ系実験Ⅲ 「ワンチップマイコンの応用」第１週目アセンブリ言語講座

リモートホストの異常を検知するための GPUとの直接通信機構

仮想メモリを用いた VMマイグレーションの高速化

オペレーティングシステムイントロダクション

仮想計算機を用いたサーバ統合における高速なリブートリカバリ

単一システムイメージを提供するための仮想マシンモニタ

豊田正史（Masashi Toyoda）福地健太郎（Kentarou Fukuchi)

第7回授業計画の修正中間テストの解説・復習前回の補足（クロックアルゴリズム・PFF) 仮想記憶方式のまとめ特別課題について

クラウドにおけるIntel SGXを用いた VMの安全な監視機構

通信機構合わせた最適化をおこなう並列化ンパイラ

J-PARC E16実験におけるDAQ-Middleware を用いたDAQソフトウェアの開発

未使用メモリに着目した複数ホストにまたがる仮想マシンの高速化

Intel SGXを用いた仮想マシンの安全な監視機構

複数ホストにまたがって動作する仮想マシンの障害対策

VMMのソフトウェア若化を考慮したクラスタ性能の比較

データ圧縮技術による文字列照合処理の高速化に関する研究

Virtualizing a Multiprocessor Machine on a Network of Computers

第5回メモリ管理（２）オーバレイ方式論理アドレスとプログラムの再配置静的再配置と動的再配置仮想記憶とメモリ階層セグメンテーション

コンピュータアーキテクチャ第 9 回.

千代浩司高エネルギー加速器研究機構素粒子原子核研究所

千代浩司高エネルギー加速器研究機構素粒子原子核研究所

Cell/B.E. のSPE上で動作する安全なOS監視システム

Mondriaan Memory Protection の調査

SpectreとMeltdown ITソリューション塾・第27期 2018年3月20日株式会社アプライド・マーケティング大越章司

コンピュータアーキテクチャ第 9 回.

強制パススルー機構を用いた VMの安全な帯域外リモート管理

SMP/マルチコアに対応した型付きアセンブリ言語

IPmigrate：複数ホストに分割されたVMのマイグレーション手法

回帰テストにおける実行系列の差分の効率的な検出手法

全体ミーティング(9/15) 村田雅之.

L4-Linux のメモリ管理における問題点とその解決策

ソケットの拡張によるJava用分散ミドルウエアの高信頼化

１．２言語処理の諸観点（１）言語処理の利用分野

千代浩司高エネルギー加速器研究機構素粒子原子核研究所

Presentation transcript:

Virtualizing a SMP on loosely-coupled computers Kenji Kaneda

漠然とした問題意識 (1/2) 個人でもCPU負荷の高い仕事はそれなりにある利用可能なCPUを駆使するのが理想ではあるけれど… 例）コンパイル，マルチメディア系アプリ利用可能なCPUを駆使するのが理想ではある遊休なPCは多いはずけれど…

漠然とした問題意識 (2/2) 実際に複数PCを駆使するのは非常に困難一般の人に使われるような分散システムは？ Single Software Imageを提供するソフトウェアも，現実にはあまり使われていない例）Distributed OS, Score, … 一般の人に使われるような分散システムは？

Background Virtual Machine ⇒複数の（遊休）CPUを利用することにより高速化例）VMWare 普及してきたまだまだ遅い ⇒複数の（遊休）CPUを利用することにより高速化

Distributed Virtual Machine Monitor Emulates virtual multiprocessor machine on physical loosely-coupled computers Memory Processor Memory Processor Memory Processor virtual physical Memory Processor Memory Processor Memory Processor Network

イメージ図

Virtual Machine Monitor System Overview (1/3) Process mapping between physical and virtual Guest OS Memory Processor virtual Virtual Machine Monitor physical guest OS VMM physical machine Target Processor: IA32 host OS kernel userland Host OS Host OS Processor Processor Memory Memory Network

Virtual Machine Monitor System Overview (2/3) gcc gcc Scheduling by guest OS Linux Memory Processor 複数CPUの利用により高速化 Real execution virtual Virtual Machine Monitor physical Windows Windows Processor Processor Memory Memory Network

System Overview (3/3) Target CPU Relaxed ◎ Intel® Itanium® ○ Pentium 4, Intel® XeonTM , P6 △ Pentium® , Intel486TM Relaxed Memory Ordering Tight ntel® Itanium® Processor - Manuals

Implementation Issues Virtualization E.g.) CPU privileged level, I/O, Device … Universal problem with VM implementation Shared memory emulation Special problem with multi-processor emulation

Shared Memory Emulation 仮想計算機からは，物理計算機のメモリが一つのメモリとしてみえる必要がある

Virtual Machine Monitor … write x to p … read from p Processor Processor Memory virtual Virtual write Virtual read Virtual Machine Monitor physical Physical memory access Host OS Host OS Communication Processor Processor Memory Memory Network

Rest of This Talk 本当にどれくらい可能か，簡単に考察してみる IA-32 Memory Model VM Memory Model Naïve Implementation Preliminary Experiments Summary

IA32 Memory Model

IA32 Memory Model Processor-ordered Memory Model Relaxed memory ordering Instructions for synchronization E.g.) CMPXCHG, FENCE, … キャッシュについて言及する必要がある What is memory ordering model? special instructions for multi-processor memory management  sequential ordering

Processor-ordered Memory Model ~ Basics ~ Read Can be carried out speculatively Write Cannot be carried out speculatively

Processor-ordered Memory Model ~ Store-buffer (1/3) ~ Write can be buffered (i.e., delayed) Read may go ahead of buffered write Write Read Write Processor Processor まだ実行されない Read プロセッサはwriteの完了をまたずに先に進める Store buffer Shared Bus Memory

Processor-ordered Memory Model ~ Store-buffer (2/3) ~ Store-buffer中のwriteは　ローカルプロセッサのみに反映される Write 1 to p Read from p Read from p Write 1 to p Processor Processor Read from p Read from p 1をread intra-processor snooping Store-bufferにはいっているwriteの結果は，ローカルプロセッサのみ読み込み可能リモートプロセッサにはwrite結果は反映されていない write結果が，ローカルプロセスのみ反映されて，リモートプロセスにはまだ反映されていないローカルプロセス内ではself-consistent can forward bufferd write value to read write x to p read x from p local bypassが起こるリモートプロセッサに対してはwriteのorderのみ保存して，反映される Store buffer 0をread Shared Bus アドレスpの初期値は0 Memory

Processor-ordered Memory Model ~ Store-buffer (3/3) ~ Writeがstore-bufferからはきだされるのは Store-bufferが溢れたとき or 同期命令を実行するとき（後述） …　

Processor-ordered Memory Model ~ Inter-processor coherence ~ Writes by a single processor are observed in the same order by all processors ローカルプロセッサ内での順番が保存されるそれ以外の順番は，保存されない store-bufferingの説明との整合性が取れていないかもしれないプロセッサ間のwriteはWrites form the individual processors on the system bus are NOT ordered w.r.t. each other

Order of writes from individual processors write to A write to B write to C write to A write to B write to C 異なるプロセッサ間では，orderが異なる可能性あり異なるプロセッサ間ではorderが異なる可能性あり Order of writes observed by processors processor #3 processor #4 write to A write to B write to C write to A write to B write to C 許可される例と，されない例がほしい同一プロセッサ内のwriteのorderは保存

Processor-ordered Memory Model ~ Unexpected Executions (1/2) ~ read-modify-write instruction Race-condition may occur Need atomic memory access read from p read from p Processor Processor read from p increment write to p read from p increment write to p 同じアドレスに対して同時に実行すると，問題 write to p write to p Shared Bus 0を読み込み 1を書き込み 0を読み込み 1を書き込み Memory アドレスpの初期値は0

Processor-ordered Memory Model ~ Unexpected Executions (2/2) ~ Dekker’s Algorithm Two processors enter the critical section simultaneously Need to serialize memory access flag1 = 1; if (flag2 == 0) critical section read from flag2 write to flag1 flag2 = 1; if (flag1 == 0) critical section write to flag2 read from flag1 Processor Processor E.g.) writeが全てのプロセッサに対して反映されてから，次の命令を実行したい Store buffer Shared Bus Memory flag1，flag2の初期値は0

Instructions for Synchronization Bus Locking Atomic instructions E.g.) LOCK prefix, exchange instruction Serializing instructions E.g.) FENCE instructions

Bus Locking LOCK instruction prefix Exchange instruction このprefixのついたread-modify-write命令は，atomicに実行される E.g.) LOCK; CMPXCHG Exchange instruction E.g.) XCHG命令 Exchange Register/Memory with Register ※カーネル内部の同期処理やpthreadなどが利用

Serializing Instructions (1/2) E.g.) MFENCE命令プログラムの文面上でこの命令より前に位置するメモリアクセス命令が，全プロセッサに反映される store-bufferにたまったwriteが発行される最近のCPUだとmemory ordering がゆるいので，それを強めるために用いる Enforce ordering and cache coherency ensures data is written to memory before subsequent stores プログラム Write MFENCE Read Write結果を全プロセッサに反映してからReadを実行

Serializing Instructions (2/2) 実行例 Dekkerのアルゴリズムが正しく動作する flag1 = 1; MFENCE; if (flag2 == 0) critical section write to flag1 flag2 = 1; MFENCE; if (flag1 == 0) critical section Processor Processor write to flag1 Store buffer Shared Bus flag1の値は0 flag2の値は0 1 Memory

All writes are buffered until a serializing instruction is issued Remark The following satisfies the specification All writes are buffered until a serializing instruction is issued Processor write … MFENCE Processor write write write Shared Bus Memory

VM Memory Model

Requirements for VM Memory Model Must satisfy specification of IA-32 To run existing OS/applications without modification Can be emulated efficiently

Memory Model Provided by VM Store-bufferを無限長とする全てのwrite命令は，serializing instructionが実行されるまでbuffering それ以外の点は実機と同じ 効率的にエミュレーション可能一回writeするたびに，通信したりする必要はない FENCE命令を実行時のみ，他のPCにstore結果が反映される Bus Lockingの動作はIA-32と同様atomicity

Naïve Implementation

Naïve Implementation 以上に述べたメモリモデルを提供するVM 通常の命令は，Native実行共有メモリのエミュレーションに必要な最低限の命令のみ，エミュレーション実行

システムの構成 Preprocessor Virtual Machine 実行コードを（静的に）変換する Native実行とエミュレーション実行からなる基本はNative実行 Native実行中にシグナルを受信すると，　　　　　　　　エミュレーション実行に移る

Preprocessor 同期命令の直前（直後）に不正な命令を挿入 ※今現在は，アセンブラを手動で編集命令の実行直前（直後）にシグナルをとばすため ※今現在は，アセンブラを手動で編集 … mfence … .byte 0x8e, 0xc8 mfence 変換

Virtual Machine (1/2) プログラムからは，0~n番地までが共有メモリとしてみえる ptrace & mmapなどを利用 ※ページテーブルや特権命令の仮想化などは未実装

Virtual Machine (2/2) 以下をエミュレーション Store-buffer Instructions for Synchronization Bus locking Serializing instructions

Emulation of Store-buffer (1/3) Serializing instruction実行時に以下を行う一つ前のserializing instructionから今までの間のwrite結果を，他のPCに反映メモリ状態のバックアップ Memory MFENCE … write Address Value 1 2 … Address Value 1 2 … Address Value 1 2 … mf1からmf2の間でmodifyされたページごとに， modifiyされる前の状態とmodifyされた後の状態を保存するこれを他のPCに反映させる Diff アドレス1の値1が変更

Emulation of Store-buffer (2/3) 各ページのアクセス保護を変更することにより実現 mprotectを利用 ※オーソドックスなSoftware DSMの実装手法

Emulation of Store-buffer (3/3) 3.アクセスされたページを保存 2.シグナルが発生・トラップ Backup MFENCE write to page 1 writeを禁止 Page番号 write 禁止 1 2 … Page番号 write 禁止 1 許可 2 … Page番号 write 許可 1 2 … Page番号 write 禁止 1 2 … 5. diffを作成・他のPCに送信 4.writeを許可ページ番号1の状態 6. writeを再禁止

Emulation of Bus Locking LOCK; CMPXCHG PC1 PC2 1. lock獲得要求 2. 実行停止 3. ack 4. CMPXCHGを実行 5. lock開放要求 6. 実行再開

Emulation of Serializing Instruction MFENCE PC1 PC2 1. diffの送信 2. diffをメモリに反映 3. ack

Preliminary Experiments

Preliminary Experiments 目的本当に性能が出るか評価する実験内容：以下を測定同期命令エミュレーションのオーバヘッド全実行命令中の同期命令の割合

Overhead of Instruction Emulation Repeat MFENCE 10,000 times Time (sec.) 比率 Native 0.00063 1 Emulation Without memory access 0.84 1325 With memory access 2.57 4052 comm.+ ptrace CPU: Pentium4 Xeon 2.4GHz (Dual processor) CPU: Pentium4 Xeon 2.4GHz (Dual processor) comm. + ptrace + mprotect

Ratio of Sync. Instruction LOCK prefixつき命令の割合 file & console access pthread OS起動中 ls実行 Linux 2.2.19 on Bochs (x86 emulator)

Discussion (1/2) Can achieve high-performance? Heavily depend on OS/applications Depend on overhead of virtualization ※UML上でシステムコールを頻繁に呼ぶと，数千倍遅くなる My VM 4 Speed up 3 2 VMWare １ # of processors １ 2 3 4

Discussion (2/2) Can work correctly? Bus locking may be unfair E.g.) 一つのプロセッサがロックし続ける Some applications may assume a memory model differing from IA-32 memory model?

Related Work VMWare ESX Server [Carl A. Waldspurger, OSDI’02] Target physical machine: SMP Target virtual machine: uni-processor Carl A. Waldspurger

Summary Distributed Virtual Machine Emulates virtual multiprocessor machine on physical loosely-coupled computers

Future Work Performance Improvement Virtualization E.g.) Privileged level, inter-processor interrupt Utilization of idle PCs Migration of VM Dynamic addition/deletion of PCs Adaptation to heterogeneous environments Grand challenge?

References (1/4) ~ VMWare ~ Virtualization system including a virtual machine monitor for a computer with a segmented architecture Devine Scott W, Bugnion Edouard, 　　　　　　　　　　　　　　and Rosenblum Mendel US Patent 6397242, 1999 Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor Jeremy Sugerman, Ganesh Venkitachalam, 　　　　　　　and Beng-Hong Lim USENIX Annual Technical Conference, 2001

References (2/4) ~ Software DSM ~ TreadMarks: Shared Memory Computing on Networks of Workstations C. Amza, A.L. Cox, S.Dwrakadas, P.kelher, H. Lu, R. Rajamonoy W. yu, and W. Zwaenepoel IEEE Computer Vol.29, No.2, pp. 18-28, 1996

References (3/4) ~ Processor Specification ~ IA-32 Intel® Architecture Software Developer's Manual A Formal Specification of Intel® Itanium® Processor Family Memory Ordering 2002

References (4/4) ~ Misc. ~ 機械語命令の静的な変換によるオペレーティング・システムのユーザ・プロセスとしての実行榮樂英樹 and 新城靖 SPA 2003 Shared Memory Consistency Models: A Tutorial Adve, Sarita V, and Kourosh Gharachorloo IEEE Computer Vol.29, No.12, pp. 66-76, 1996

Appendix Sequential memory ordering Uni-processorと同じordering All memory access occurs one at a time Read returns the value of the “last” write to the same location

Processor-ordered Memory Model ~ Summary ~ Read can be performed in any order Writes are always performed in program order program order=プロ部グラムの Writes and following reads can be reordered ※抜けている点について考える

Bus Lockingのエミュレーション他のプロセッサの動作を止めてから実行を行う fenceと同様に命令の直前に不正な命令を挿入することによってトラップ

STORE系命令のエミュレーション書き込みの起こったページを把握しておく mprotectを利用[3] twinの作成

FENCE命令のエミュレーション (2/2) 命令をトラップする図例を入れる ※とりあえず今は手動で挿入以前のfence命令からのメモリ状態の差分を計算し，他の計算機にその情報を送信するプログラムカウンタを増やして次の命令の実行を再開一つ前のfenceから今までの書き込みを　　　　　リモートメモリに反映反映が完了するまで次の命令は実行しない終了したらtwinなどをクリアする

Order of writes from individual processors write A.1 write B.1 write C.1 write A.2 write B.2 write C.2 Order of actual writes from all processors to memory write A.1 write B.1 write A.2 write B.2 write C.1 write C.2 許可される例と，されない例がほしい