Milo是一种用于单细胞RNA测序数据的差异丰度分析方法,能够检测不同条件下细胞邻域的组成变化。
milo算法简介
milo是一种专门为单细胞RNA测序数据设计的差异丰度分析方法。其核心思想是通过构建细胞邻域(neighborhoods)来检测不同实验条件下细胞群体组成的变化。
个人理解这个算法的设计目的是在不预先定义的情况下,找出两种情况下有群体比例差异的细胞,有了检测结果后,可以将这些有差异的细胞单独提出来进行进一步的分析。
R环境下的miloR实现
R语言中使用miloR包进行分析,以下是相应的实现:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
| library(Seurat) library(miloR) library(SingleCellExperiment) library(scater) library(dplyr) library(tidyr)
seurat_obj <- readRDS("seurat.rds") sce <- as.SingleCellExperiment(seurat_obj)
set.seed(4466)
milo_obj <- Milo(sce)
milo_obj <- buildGraph( milo_obj, k = 30, d = 20, transposed = TRUE, reduced.dim = "UMAP" )
milo_obj <- makeNhoods( x = milo_obj, prop = 0.1, k = 30, d = 20, refined = TRUE, reduced_dims = "UMAP", refinement_scheme = "graph" )
milo_obj <- countCells( milo_obj, meta.data = data.frame(colData(milo_obj)), sample = "sample_id" )
traj_design <- data.frame(colData(milo_obj))[,c("sample_id", "group")] %>% distinct() %>% mutate( sample_id = as.character(sample_id), group = as.character(group) ) rownames(traj_design) <- traj_design$sample_id traj_design <- traj_design[colnames(nhoodCounts(milo_obj)), , drop=FALSE]
da_results <- testNhoods( milo_obj, design = ~ 0 + group, model.contrasts = c('groupCase - groupControl'), design.df = traj_design, fdr.weighting = "graph-overlap" )
da_results %>% arrange(SpatialFDR) %>% filter(SpatialFDR < 0.1) %>% tail(3)
milo_obj <- buildNhoodGraph(milo_obj)
|
Python环境下的Milo实现
Python中有多个milo实现,除了官方miloR项目中提到的INVALID POST SLUG PROVIDED milopy,pertpy库也实现Milo的算法,以下是用pertpy完成计算的例子
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
| import numpy as np import pertpy as pt import scanpy as sc import pandas as pd
adata = sc.read_h5ad('adata.h5ad')
milo = pt.tl.Milo() mdata = milo.load(adata)
sc.pp.neighbors( mdata["rna"], use_rep="X_umap", n_pcs=20, n_neighbors=30, )
milo.make_nhoods(mdata["rna"], prop=0.1) mdata = milo.count_nhoods(mdata, sample_col="sample_id")
mdata["rna"].obs["group"] = mdata["rna"].obs["group"].cat.reorder_categories(["Control", "Case"])
milo.da_nhoods(mdata, design="~group", solver="pydeseq2")
milo.build_nhood_graph(mdata)
significant_results = mdata['milo'].var.query('SpatialFDR < 0.1')
|
后记
实际运行的性能上,两者差距木有太大,但是计算结果上还是又差的,毕竟py版本是算法实现,而不是一点点复现。具体的计算一致性几何,后续我从cellxgene上找个数据集再来实际算算好了。