Seven Databases in Seven Weeks HBase. HDFS (Hadoop Distributed File System) Server DFS HBase.

Seven Databases in Seven Weeks HBase

HDFS (Hadoop Distributed File System) Server DFS HBase

7 つのデータベース７つの世界での構成１日目： CRUD とテーブル管理２日目：ビッグデータを扱う３日目：クラウドに持っていくスタンドアロンで Hbase を動かすテーブルを作るデータの出し入れをする Wikipedia ダンプを投入するスクリプト (Not Shell) での操作に慣れる Thrift を使って操作する Whirr を使って EC2 にデプロイする今回は扱いません

HBase の特徴自動シャーディング・自動フェールオーバーデータの一貫性 (CAP:Consistency) Hadoop/HDFS 統合各種インタフェーステーブルサイズが大きくなった時、自動的に分割する分割されたシャードは、ノード障害時に自動的にフェールオーバーするデータの更新は反映された瞬間から読出可能結果的に同じ値が読めるようになる（結果整合性）条件緩和を取らない Hadoop の HDFS 上に展開できる Hadoop/MapReduce で API を挟まず HBase を入出力の対象にできる Java Native API の他、 Thrift, REST API から利用可能

１日目： HBase をスタンドアロンで展開する [root@HBase01 ask]# cd /opt/ [root@HBase01 opt]# wget http://ftp.meisei-u.ac.jp/mirror/apache/dist/hbase/hbase-0.94.7/hbase-0.94.7.tar.gz [root@HBase01 opt]# tar zxvf hbase-0.94.7.tar.gz [root@HBase01 opt]# vi hbase-0.94.7/conf/hbase-site.xml hbase.rootdir file:///var/files/hbase hbase.zookeeper.property.dataDir /var/files/zookeeper 実行コマンド hbase-site.xml /var /files /hbase /zookeeper ファイル実体配置単体で可動するための最小限の設定ファイル設置先の指定で、任意のディレクトリを書き出し先に指定する xml で指定できる全項目 : src/main/resources/hbase-default.xml

１日目： HBase をスタンドアロンで展開する [root@HBase01 opt]# hbase-0.94.7/bin/start-hbase.sh +======================================================================+ | Error: JAVA_HOME is not set and Java could not be found | +----------------------------------------------------------------------+ | Please download the latest Sun JDK from the Sun Java web site | | > http://java.sun.com/javase/downloads/ < | | | HBase requires Java 1.6 or later. | | NOTE: This script will find Sun Java whether you install using the | | binary or the RPM based installer. | +======================================================================+ JDK が要求される [root@HBase01 opt]# vi hbase-0.94.7/conf/hbase-env.sh - # export JAVA_HOME=/usr/java/jdk1.6.0/ + export JAVA_HOME=/usr/java/latest/ JDK のバリエーション（以下から選んで導入） Oracle JDKOpen JDK 1.6 1.7 Java のインストールディレクトリを指定

１日目： HBase をスタンドアロンで展開する [root@HBase01 opt]# hbase-0.94.7/bin/start-hbase.sh starting master, logging to /opt/hbase-0.94.7/bin/../logs/hbase-root-master-HBase01.db.algnantoka.out [root@HBase01 opt]# hbase-0.94.7/bin/stop-hbase.sh stopping hbase........... 起動停止 [root@HBase01 opt]# hbase-0.94.7/bin/hbase shell HBase Shell; enter 'help ' for list of supported commands. Type "exit " to leave the HBase Shell Version 0.94.7, r1471806, Wed Apr 24 18:48:26 PDT 2013 hbase(main):001:0> status 1 servers, 0 dead, 2.0000 average load シェル接続

１日目： HBase の使い方 hbase(main):009:0> help "create" Create table; pass table name, a dictionary of specifications per column family, and optionally a dictionary of table configuration. Dictionaries are described below in the GENERAL NOTES section. Examples: hbase> create 't1', {NAME => 'f1', VERSIONS => 5} hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'} hbase> # The above in shorthand would be the following: hbase> create 't1', 'f1', 'f2', 'f3‘ hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true} hbase> create 't1', 'f1', {SPLITS => ['10', '20', '30', '40']} hbase> create 't1', 'f1', {SPLITS_FILE => 'splits.txt'} hbase> # Optionally pre-split the table into NUMREGIONS, using hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname) hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'} テーブル作成 : create Create ‘TableName’, {NAME => ‘ColumnFamilyName’, Option => Value …} … 基本型省略表記 Create ‘TableName’, ‘ColumnFamilyName’, …

１日目： HBase の使い方 hbase(main):010:0> help "put" Put a cell 'value' at specified table/row/column and optionally timestamp coordinates. To put a cell value into table 't1' at row 'r1' under column 'c1' marked with the time 'ts1', do: hbase> put 't1', 'r1', 'c1', 'value', ts1 レコード挿入 : put SampleTable : create ‘SampleTable’, ‘color’, ‘shape’ put ‘SampleTable’, ‘first’, ‘color:red’, ‘#F00’ put ‘SampleTable’, ‘first’, ‘color:blue’, ‘#00F’ put ‘SampleTable’, ‘first’, ‘color:yellow’, ‘#FF0’

１日目： HBase の使い方 hbase(main):011:0> help "get" Get row or cell contents; pass table name, row, and optionally a dictionary of column(s), timestamp, timerange and versions. Examples: hbase> get 't1', 'r1‘ hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]} hbase> get 't1', 'r1', {COLUMN => 'c1'} hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']} hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1} hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4} hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4} hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"} hbase> get 't1', 'r1', 'c1‘ hbase> get 't1', 'r1', 'c1', 'c2‘ hbase> get 't1', 'r1', ['c1', 'c2'] レコード取得 : get get ‘SampleTable’, ‘first’ SampleTable get ‘SampleTable’, ‘first’, ‘color’ get ‘SampleTable’, ‘first’, ‘color:blue’

１日目： HBase の使い方 hbase(main):001:0> help 'scan' Scan a table; pass table name and optionally a dictionary of scanner specifications. Scanner specifications may include one or more of: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH, or COLUMNS, CACHE If no columns are specified, all columns will be scanned. To scan all members of a column family, leave the qualifier empty as in 'col_family:'. The filter can be specified in two ways: 1. Using a filterString - more information on this is available in the Filter Language document attached to the HBASE-4176 JIRA 2. Using the entire package name of the filter. Some examples: hbase> scan '.META.' hbase> scan '.META.', {COLUMNS => 'info:regioninfo'} hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]} hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))"} hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} For experts, there is an additional option -- CACHE_BLOCKS -- which switches block caching for the scanner on (true) or off (false). By default it is enabled. Examples: hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false} Also for experts, there is an advanced option -- RAW -- which instructs the scanner to return all cells (including delete markers and uncollected deleted cells). This option cannot be combined with requesting specific COLUMNS. Disabled by default. Example: hbase> scan 't1', {RAW => true, VERSIONS => 10} レコード検索 : scan

１日目： HBase の使い方 TimeStamp #FFF ‘first’, ‘color:red’ #000 #0F0 #00F #F00 put ‘table’, ‘first’, ‘color:red’, ‘#FFF‘ put ‘table’, ‘first’, ‘color:red’, ‘#000' put ‘table’, ‘first’, ‘color:red’, ‘#0F0‘ put ‘table’, ‘first’, ‘color:red’, ‘#00F' put ‘table’, ‘first’, ‘color:red’, ‘#F00' timestamp 1 timestamp 2 timestamp 3 timestamp 4 timestamp 5 get ‘table’, ‘first’, ‘color:red’ get ‘table’, ‘first’, {COLUMN=>‘color:red’, TIMESTAMP=>4} get ‘table’, ‘first’, {COLUMN=>‘color:red’, VERSIONS=>4}

１日目： HBase の使い方スキーマ変更 : alter hbase(main):009:0> disable 'table1' 0 row(s) in 2.5190 seconds hbase(main):010:0> get 'table1', 'first','color:red' COLUMN CELL ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: table1 is disabled. hbase(main):012:0> alter 'table1', { NAME => 'color', VERSIONS => 10} Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 1.3630 seconds hbase(main):014:0> enable 'table1' 0 row(s) in 2.3000 seconds alter の対象 Table はオフラインでなければならない保持するバージョン数の変更 alter によるスキーマ変更の手順は以下 1. 新たなスキーマの空テーブルを作る 2. 元テーブルからデータを複製する 3. 元テーブルを破棄する高コストなので、原則スキーマ変更（ ColumnFamily の変更）は行わない

１日目： HBase の使い方 JRuby スクリプティング include Java import org.apache.hadoop.hbase.client.HTable import org.apache.hadoop.hbase.client.Put import org.apache.hadoop.hbase.HBaseConfiguration def jbytes(*args) args.map { |arg| arg.to_s.to_java_bytes } end table = HTable.new( HBaseConfiguration.new, "table1" ) p = Put.new( *jbytes( "third" ) ) p.add( *jbytes( "color", "black", "#000" ) ) p.add( *jbytes( "shape", "triangle", "3" ) ) p.add( *jbytes( "shape", "square", "4" ) ) table.put( p ) hoge.rb [root@HBase01 opt]# hbase-0.94.7/bin/hbase shell hoge.rb hbase(main):002:0> get 'table1', 'third',{COLUMN => ['color','shape']} COLUMN CELL color:black timestamp=1369049856405, value=#000 shape:square timestamp=1369049856405, value=4 shape:triangle timestamp=1369049856405, value=3 9 row(s) in 0.0870 seconds 実行レコード挿入タイミングレコードの timestamp が揃う hbase shell は JRuby インタプリタを拡張したものなので、 JRuby が実行できる hbase 関係の Java クラス

Hbase とは何か Google File Sytem (GFS) MapReduceBigTable Google の内部システム（発表した論文より） Hadoop Distributed File Sytem (HDFS) MapReduceHBase Hadoop プロジェクト（ Google クローン）バッチ処理リアルタイム応答

RowKeyColumnFamily1ColumnFamily2ColumnFamily3 1Column1Column2Column1Column2Column1 2Column2Column3Column2Column3 BigTable( ソート済列志向データベース ) スキーマで定義するスキーマレス（自由に追加できる）必須ソート済 #FFF ある Column #000 #0F0 #00F #F00 timestamp 1 timestamp 2 timestamp 3 timestamp 4 timestamp 5 タイムスタンプでバージョニングされる

RowKeyColumnFamily1ColumnFamily2ColumnFamily3 1 2 3 4 5 6 7 8 9 リージョン BigTable( ソート済列志向データベース ) リージョンテーブルはリージョンで物理的に分割（シャーディング）されるリージョンはクラスタ中のリージョンサーバが担当するリージョンは ColumnFamily 毎に作られるリージョンはソート済の RowKey を適当なサイズで分割する

BigTable( ソート済列志向データベース ) ColumnFamily はむやみに増やさない → Column の追加で極力対応 RowKey は連続アクセスが起きやすい形にしておくテーブルはリージョンで物理的に分割（シャーディング）されるリージョンはクラスタ中のリージョンサーバが担当するリージョンは ColumnFamily 毎に作られるリージョンはソート済の RowKey を適当なサイズで分割する Column や ColumnFamily を条件にして検索する構造を取らないテーブルスキーマの初期設計超重要

HBase の特徴自動シャーディング・自動フェールオーバーデータの一貫性 (CAP:Consistency) Hadoop/HDFS 統合テーブルサイズが大きくなった時、自動的に分割する分割されたシャードは、ノード障害時に自動的にフェールオーバーするデータの更新は反映された瞬間から読出可能結果的に同じ値が読めるようになる（結果整合性）条件緩和を取らない Hadoop の HDFS 上に展開できる Hadoop/MapReduce で API を挟まず HBase を入出力の対象にできる

HBase の特徴を構成する要素自動シャーディング・自動フェールオーバーデータの一貫性 (CAP:Consistency) Hadoop/HDFS 統合リージョンの自動分割？？ HDFS : GFS クローン Hbase : BigTable クローン

HDFS HBase の特徴を構成する要素自動フェールオーバー・データの一貫性 (CAP:Consistency) Master Server ZooKeeper Region Server ( フェールオーバー先 ) ローカルストア RegionWAL オンメモリストア Read Write ローカルストア RegionWAL replicate

２日目： Wikipedia のデータを扱う力尽きた＼ ( ＾ 0 ＾ ) ／

Scan にかかる秒数

Seven Databases in Seven Weeks HBase. HDFS (Hadoop Distributed File System) Server DFS HBase.

Similar presentations

Presentation on theme: "Seven Databases in Seven Weeks HBase. HDFS (Hadoop Distributed File System) Server DFS HBase."— Presentation transcript:

Similar presentations

About project

フィードバック

ログインする

Auth with social network:

Seven Databases in Seven Weeks HBase. HDFS (Hadoop Distributed File System) Server DFS HBase.

Similar presentations

Presentation on theme: "Seven Databases in Seven Weeks HBase. HDFS (Hadoop Distributed File System) Server DFS HBase."— Presentation transcript:

Similar presentations

About project

フィードバック