Anja von Heydebreck et al. 発表：上嶋裕樹

Anja von Heydebreck et al. 発表：上嶋裕樹 uejima@is.s.u-tokyo.ac.jp
Identifying splits with clear separation: a new class discovery method for gene expression data Anja von Heydebreck et al. 発表：上嶋裕樹

生物学的な背景マイクロアレイは細胞の発現型と分子的な特性との関係を調べるための強力な道具である。
マイクロアレイ遺伝子発現データの解析における重要なトピックとして，class prediction とclass discoveryがある。 Class prediction: 細胞組織を，あらかじめ発現型によって特徴付けられたカテゴリーに割り当てる。 Class discovery: 遺伝子間，細胞組織間，もしくは遺伝子と細胞組織の間の関係を見つける。

この論文でのclass discoveryの手法
ISIS(identifying splits with clear separation) diagonal linear discriminant (DLD) scoreを導入し，それがlocal maximumとなるような，細胞組織サンプルのbipartitionを見つける。 DLD scoreのlocal maximumを見つけるための高速なheuristicを採用する。その際，遺伝子クラスターの平均発現プロファイルを入力として使う。

データの形式化データ行列 X = (xgj) 行は遺伝子(g = 1,…, k)に対応。列は組織(j = 1,…,n)に対応。 {1,…, n}の部分集合　　　　について，以下が成り立つとき　　　　　　　　をbipartitionもしくはsplitと呼ぶ。

Diagonal Linear Discriminant Analysis (DLDA)（１）
y = (y1,…,yk)をsplit　　　　　　　　のどちらの部分集合に分類するか。もし　　　　　　　　　　　　　　であればyはM に分類される。　　も同様に。

Diagonal Linear Discriminant Analysis (DLDA)（２）
2サンプルt-statistic: 各遺伝子について，それがどれだけ強く分類に関係するかを示す。絶対値の高いものだけを残し，あとのものは捨てる。

Diagonal Linear Discriminant (DLD) Score
split　　　　　　　　に対するDLD score は次のように定義する。つまり　　　　に対する2サンプルt-statisticである。

分類の実行（１）まず候補となるbipartitionを効率的なheuristicで見つける。
単体の遺伝子の発現量でなく，遺伝子をクラスタリングして，各クラスターの平均発現量を使う。Augmented data matrix　　　　　　　として表せる。そちらの方が安定なので。あるcut point　　　について，次のようにbipartitionを決められる。

分類の実行（２）それらの候補からスタートして，greedyにlocal maximumを探索する。
そしてその2サンプルt-statistic 　　　　　　　　　　　　　　　　が次の条件をみたすならば　　　で定義できるbipartitionを候補として採用する。それらの候補からスタートして，greedyにlocal maximumを探索する。 Fnm: two-sample t-statistic for the m smallest and the n-m largest of n independent identically distributed normal random variables の分布関数

実験の条件使用したデータセット Leukemia: 72 samples, 6817 genes. Lymphoma/leukemia: 62 samples, 4026 genes. Melanoma: 31 samples, 6971 genes. 2000個の遺伝子を選び，centroid linkage hierarchical clusteringでcorrelation coefficientを用いてクラスタリングした。各データセットにつき，700個の候補bipartitionをクラスタリングなどで100個まで減らし，それを探索の出発点とした。

DLD scoreの実用性（１）ランダムにbipartitionを生成した場合のDLD scoreなどのヒストグラム

DLD scoreの実用性（２）生物学的に意味のあるサンプルが，DLD scoreでどれだけ特徴付けられるか。

アルゴリズムの実行結果（１） Leukemia Lymphoma/leukemia

アルゴリズムの実行結果（２） Melanoma

まとめと今後の課題遺伝子発現データセットによって表された癌のサブタイプを特徴付ける数学的な基準を導入した。
その基準を用いて，前提知識なしにサブタイプを発見するアルゴリズムを紹介した。変数選択（遺伝子の選択）が重要であり，それがclass discoveryに及ぼす影響について，よりシステマティックに研究する必要がある。しばしば大部分の遺伝子は，調べている発現型と無関係で，ノイズでしかない。

参考文献 Anja von Heydebreck, Wolfgang Huber, Annemarie Poustka and Martin Vingron: Identifying splits with clear separation: a new class discovery method for gene expression data, Bioinformatics, Vol. 17 Suppl , Pages S107-S114.

Anja von Heydebreck et al. 発表：上嶋裕樹

Similar presentations

Presentation on theme: "Anja von Heydebreck et al. 発表：上嶋裕樹"— Presentation transcript:

Similar presentations

About project

フィードバック

ログインする

Auth with social network:

Anja von Heydebreck et al. 発表：上嶋裕樹

Similar presentations

Presentation on theme: "Anja von Heydebreck et al. 発表：上嶋裕樹"— Presentation transcript:

Similar presentations

About project

フィードバック