🏈 指点迷津 | Brief 🎯要点
🎯人体疾病模块网络结构位置与病理生物学关系 | 🎯药物与药物靶点相互作用 | 🎯细胞和蛋白质之间的作用分层 | 🎯疾病和症状之间的联系 | 🎯药物与副作用之间的联系 | 🎯生物学分析
📜图分析用例
📜Python鲁汶意外莱顿复杂图拓扑分解算法
🍪语言内容分比
Python药物副作用数学矩阵降维算法
预测药物效应在药物研发中非常重要,而药物研发是制药科学的主要目标。关于已批准药物的安全性、有效性和耐受性,有大量信息可用,这可以最大限度地减少预测新药效应所需的费用和时间。
药物数据准备
在本研究中,观察到的标签矩阵用Y ∈ R m × n Y \in R ^{m \times n} Y ∈ R m × n 表示,其中m m m 和n n n 分别是药物和靶标的数量。每个元素用y i j ∈ { − 1 , 0 , + 1 } y_{i j} \in \{-1,0,+1\} y ij ∈ { − 1 , 0 , + 1 } 表示,其中+1表示正标签,-1表示负标签,0表示缺失标签。我们将交互矩阵 Y 分解为两个低秩潜在特征矩阵 U ∈ R m × k 和 V ∈ R n × k U \in R ^{m \times k} 和 V \in R ^{n \times k} U ∈ R m × k 和 V ∈ R n × k ,其中 k 是潜在特征向量的维度。我们假设Y可以用U和V的乘积来表示,如下:
arg min U , V ∥ R ∘ ( Y − U V T ) ∥ F 2 \underset{ U , V }{\arg \min }\left\| R ^{\circ}\left( Y - U V ^T\right)\right\|_F^2 U , V arg min R ∘ ( Y − U V T ) F 2 其中∥ ⋅ ∥ F \|\cdot\|_F ∥ ⋅ ∥ F 表示 Frobenius 范数,∘ { }^{\circ} ∘ 表示两个矩阵的 Hadamard 乘积。设R ∈ R m × n R \in R ^{m \times n} R ∈ R m × n 为指示矩阵,其中当y i j = 1 y_{i j}=1 y ij = 1 时r i j = p w r_{i j}=p w r ij = pw ,当y i j = − 1 y_{i j}=-1 y ij = − 1 时,r i j = n w r_{i j}=n w r ij = n w ,否则为0。请注意, p w p_w p w 和 n w n w n w 分别是正标签和负标签的权重,默认值都是 1 。由于R的存在,我们只关注正标签和负标签,缺失标签并不会导致任何损失。
药物邻域信息采用邻接矩阵𝐀表示,其元素定义如下:
A i , μ = { S i , μ d if d μ ∈ N ( d i ) 0 否则 A _{i, \mu}=\left\{\begin{array}{cc} S_{i, \mu}^d & \text { if } d _\mu \in N\left(d_{ i }\right) \\ 0 & \text { 否则 } \end{array}\right. A i , μ = { S i , μ d 0 if d μ ∈ N ( d i ) 否则 其中 N ( d i ) N\left(d_{ i }\right) N ( d i ) 是通过选择 K 1 K_1 K 1 个与 d i d _{ i } d i 最相似的药物来构建的。药物靶标效应邻域信息B可以类似地定义如下:
B j , v = { S j , v t if d v ∈ N ( t j ) 0 否则 B _{j, v}=\left\{\begin{array}{cc} S_{j, v}^t & \text { if } d _v \in N\left( t _{ j }\right) \\ 0 & \text { 否则 } \end{array}\right. B j , v = { S j , v t 0 if d v ∈ N ( t j ) 否则 最小化潜在空间中 d i d _{ i } d i 与其最近邻 N ( d i ) N\left(d_{ i }\right) N ( d i ) 之间距离的目标函数如下:
α 2 ∑ i = 1 m ∑ μ = 1 m A i , μ ∥ u i − u μ ∥ F 2 = α 2 tr ( U T L d U ) \frac{\alpha}{2} \sum_{i=1}^m \sum_{\mu=1}^m A _{i, \mu}\left\| u _i- u _\mu\right\|_F^2=\frac{\alpha}{2} \operatorname{tr}\left( U ^T L ^d U \right) 2 α i = 1 ∑ m μ = 1 ∑ m A i , μ ∥ u i − u μ ∥ F 2 = 2 α tr ( U T L d U ) Python伪码算法实践:
Copy import pandas as pd
import numpy as np
from collections import defaultdict
from tqdm import tqdm
import sys
from _utils import performance_compare as pc
Copy df = pd . read_csv (base_dir + '/Data/table.csv' ,index_col = 0 )
drugMat = pd . read_csv (base_dir + '/Data/sim.csv' ,index_col = 0 )
diseaseMat = pd . read_csv (base_dir + '/Data/sim.csv' ,index_col = 0 )
ind_bi = pd . read_csv (base_dir + '/Data/binary.csv' ,index_col = 0 )
se_bi = pd . read_csv (base_dir + '/Data/binary.csv' ,index_col = 0 )
use_ind = ind_bi [ df . columns . tolist ()]. loc [ df . index . tolist ()]
use_se = se_bi [ df . columns . tolist ()]. loc [ df . index . tolist ()]
drug2name = pd . read_pickle (base_dir + '/Data/name.pkl' )
list (drug2name. keys ()) [ 0 ]
name2pt = pd . read_pickle (base_dir + '/Data/cui.pkl' )
drugs = df . index . tolist ()
diseases = df . columns . tolist ()
ann_ind = [drug2name . get (k) for k in drugs]
ann_col = [pt2name . get (k) for k in diseases]
df . index = ann_ind
df . columns = ann_col
drugMat . index = ann_ind
drugMat . columns = ann_ind
diseaseMat . index = ann_col
diseaseMat . columns = ann_col
use_ind . index = ann_ind
use_ind . columns = ann_col
use_se . index = ann_ind
use_se . columns = ann_col
Copy cv_data = defaultdict ( list )
idx = df . columns . tolist (). index (r)
intMat = np . array (df.T)
num_drugs , num_disease = intMat . T . shape
test_data = np . array ([[k,j] for k in [idx] for j in range (num_drugs)],dtype = np.int32)
x , y = test_data [:, 0 ], test_data [:, 1 ]
test_label = intMat [ x , y ]
W = np . ones (intMat.shape)
W [ x , y ] = 0
cv_data [ ann_col [ idx ]]. append ((W,test_data,test_label))
for W , test_data , test_label in cv_data [ r ]:
model = nrbdmf.NRBdMF(K1=5,K2=5,num_factors=50,theta=1.0,lambda_d=0.625,lambda_t=0.625,alpha=0.1,beta=0.1, max_iter=5000,tolx=1e-5,positive_weight=0.1,negative_weight=0.1,missing_base=0,indicator=True,half_mask=False,verbose=False)
model . fix_model (W, np. array (df).T, diseaseMat, drugMat, seed = 123 )
posi_val , nega_val = model . ex_evaluation (test_data, test_label)
print (posi_val,nega_val)
# processing output
pred_res = model . pred_res
pred_res . index = df . index . tolist ()
se_label = use_se [ r ]. tolist () # side effect binary label (before merging)
pred_res [ 'se_label' ] = se_label
ind_label = use_ind [ r ]. tolist ()
pred_res [ 'ind_label' ] = ind_label
pred_res [ 'name' ] = pred_res . index . tolist ()
pred_res = pred_res . sort_values ( 'scores' ,ascending = False )
auroc , aupr = nrbdmf . calc_auc_aupr (y_true = pred_res[ 'se_label' ]. tolist (), y_score = pred_res[ 'scores' ]. tolist (),title = r)
Copy cv_data = defaultdict ( list )
idx = df . index . tolist (). index (r)
intMat = np . array (df)
num_drugs , num_disease = intMat . T . shape
test_data = np . array ([[k,j] for k in [idx] for j in range (num_drugs)],dtype = np.int32)
x , y = test_data [:, 0 ], test_data [:, 1 ]
test_label = intMat [ x , y ]
W = np . ones (intMat.shape)
W [ x , y ] = 0
cv_data [ ann_ind [ idx ]]. append ((W,test_data,test_label))
for W , test_data , test_label in cv_data [ r ]:
model = nrbdmf.NRBdMF(K1=5,K2=5,num_factors=50,theta=1.0,lambda_d=0.625,lambda_t=0.625,alpha=0.1,beta=0.1, max_iter=5000,tolx=1e-5,positive_weight=0.1,negative_weight=0.1,missing_base=0,indicator=True,half_mask=False,verbose=False)
model . fix_model (W, np. array (df), drugMat, diseaseMat, seed = 123 )
posi_val , nega_val = model . ex_evaluation (test_data, test_label)
print (posi_val,nega_val)
pred_res = model . pred_res pred_res . index = df . columns . tolist ()
se_label = use_se . loc [ r ]. tolist ()
pred_res [ 'se_label' ] = se_label
ind_label = use_ind . loc [ r ]. tolist ()
pred_res [ 'ind_label' ] = ind_label
pred_res [ 'name' ] = pred_res . index . tolist ()
pred_res = pred_res . sort_values ( 'scores' ,ascending = False )
auroc , aupr = nrbdmf . calc_auc_aupr (y_true = pred_res[ 'se_label' ]. tolist (), y_score = pred_res[ 'scores' ]. tolist (),title = r)
Last updated 3 months ago