Seurat3整合⽅法merge()与IntegrateData()
参考:www.jianshu/p/ebc328f9fb73
周运来就是我
随着单细胞测序技术的成熟,越来越多的研究者选择应⽤该技术来阐释⼿上的⽣物学问题。同时
单细胞也不再是单样本单物种单器官的技术,往往会⽤到多样本整合分析的技术,这⽅⾯Seurat
团队是最值得关注的。他们提出了⼀套⽤于单细胞样本整合分析的算法,Comprehensive integration of single cell data,并打包成Rpackages可以说是很贴⼼了。
其整合单细胞数据的核⼼函数之⼀就是:FindIntegrationAnchors {Seurat}
如其名称所指,是⽤来Finds the integration anchors的,他是如何做到的呢?
?FindIntegrationAnchors
但是在整合的时候有时候整合不成功,特别是应⽤之前的单细胞技术,识别的细胞较少的时候,
如可能会报错:
Filtering Anchors
Error in nn2(data = cn.lls2, ], query = cn.lls1,  :
Cannot find more nearest neighbours than there are points
github上有这个问题的讨论:github/satijalab/seurat/issues/997
有⽤的回答是这样的:
merge函数
I experienced the same problem when integrating very heterogeneous datasets, and it seems to
be related to the size of k.filter parameter.In my case, lowering the default value of 200 to 150 did
the job. However, I am still trying to figure out whether the results are meaningful, and the underlying biological rationale of it, so any explanation would be welcome!
那么k.filter的影响到底是多⼤呢?我们来试⼀下。
我⽤Unravelling subclonal heterogeneity and aggressive disease states in TNBC through
single-cell RNA-seq这篇⽂章的数据集:GSE118390.
library(R.utils)
library(tidyverse)
library(reticulate)
use_python("E:\\conda\\")
gunzip("D:\\Users\\Administrator\\Desktop\\RStudio\\con-cluster\\GSE118389_", remove=FALSE) medata<-read.table("D:\\Users\\Administrator\\Desktop\\RStudio\\con-cluster\\GSE118389_")
dim(medata)
table(str_split(colnames(medata),pattern="_",simplify=T)[,1])
PT039 PT058 PT081 PT084 PT089 PT126
341    96  288  286  333  190
Seurat基本流程:
library(Seurat)
pbmc <- CreateSeuratObject(counts = medata, project = "pbmc3k", lls = 3, min.features = 200)
pbmc <- NormalizeData(pbmc, hod = "LogNormalize", scale.factor = 10000)
pbmc <- FindVariableFeatures(pbmc, hod = "vst", nfeatures = 2000)
pbmc <- ScaleData(pbmc,features=VariableFeatures(pbmc))
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = 0.5)
pbmc <- RunUMAP(pbmc, dims = 1:10)
pbmc <- RunTSNE(pbmc, dims = 1:10)
?DimPlot
table(pbmc@meta.data$orig.ident)
PT039 PT058 PT081 PT089
127    58    57  237
DimPlot(pbmc, reduction = "umap",label = TRUE,group.by="orig.ident")
这就是没有整合的时候,四个样本的降维结果。
下⾯我们⽤不同的k.filter参数来试⼀下,
pancreas.list <- SplitObject(pbmc, split.by = "orig.ident")
pancreas.list <- pancreas.list[c("PT039", "PT058", "PT081", "PT089")]
for (i in 1:length(pancreas.list)) {
pancreas.list[[i]] <- NormalizeData(pancreas.list[[i]], verbose = FALSE)
pancreas.list[[i]] <- FindVariableFeatures(pancreas.list[[i]], hod = "vst",
nfeatures = 2000, verbose = FALSE)
}
#reference.list <- pancreas.list[c("PT081", "PT089")]
SelectIntegrationFeatures(object.list = pancreas.list)
#?FindIntegrationAnchors
pancreas.anchors <- FindIntegrationAnchors(object.list = pancreas.list, dims = 1:20,k.anchor = 5,k.filter = 30)
#?IntegrateData
pancreas.integrated <- IntegrateData(anchorset = pancreas.anchors, dims = 1:20)
Merging dataset 3 into 1
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0%  10  20  30  40  50  60  70  80  90  100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data
Merging dataset 2 into 1 3
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0%  10  20  30  40  50  60  70  80  90  100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data
Merging dataset 4 into 1 3 2
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0%  10  20  30  40  50  60  70  80  90  100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data
pancreas.integrated <- ScaleData(pancreas.integrated,features=VariableFeatures(pbmc))
pancreas.integrated <- RunPCA(pancreas.integrated, features = VariableFeatures(object = pbmc))
pancreas.integrated <- FindNeighbors(pancreas.integrated, dims = 1:10)
pancreas.integrated <- FindClusters(pancreas.integrated, resolution = 0.5)
pancreas.integrated <- RunUMAP(pancreas.integrated, dims = 1:10)
pancreas.integrated <- RunTSNE(pancreas.integrated, dims = 1:10)
DimPlot(pancreas.integrated, reduction = "umap",label = TRUE,group.by="orig.ident")+ggtitle("k.filter = 30")
可见,seurat在整合多样本的时候并不会⾃动为研究者提供合适的参数,我们也不应这样要求他们。需要注意的是default虽然是⽤的最多的,并不⼀定是最优的。
还有⼀种⽅式merge()即简单地讲多个数据集放到⼀起,并不运⾏整合分析。
pancreas.list <- SplitObject(pbmc, split.by = "orig.ident")
pancreas.list <- pancreas.list[c("PT039", "PT058", "PT081", "PT089")]
mpancreas.list<-pancreas.list[c("PT039", "PT058", "PT081")]
mymerg<-merge(pancreas.list$PT089,mpancreas.list)
mymerg <- NormalizeData(mymerg, hod = "LogNormalize", scale.factor = 10000)
mymerg <- FindVariableFeatures(mymerg, hod = "vst", nfeatures = 2000) mymerg <- ScaleData(mymerg,features=VariableFeatures(mymerg))
mymerg <- RunPCA(mymerg, features = VariableFeatures(object = mymerg))
mymerg <- FindNeighbors(mymerg, dims = 1:10)
mymerg <- FindClusters(mymerg, resolution = 0.5)
mymerg <- RunUMAP(mymerg, dims = 1:10)
mymerg <- RunTSNE(mymerg, dims = 1:10)
pm<-DimPlot(mymerg, reduction = "umap",label = TRUE,group.by="orig.ident")+ggtitle("merge") CombinePlots(plots = list(p0,pm),legend="bottom")