object,使用loess拟合出来的对象;
newdata,可选数据框,在里面寻找变量并进行预测;
se,是否计算标准误差;
对NA值的处理
生物数据分析中,我们想查看PCR扩增出来的扩增子的测序深度之间的差异,但不同的扩增子的扩增效率受到GC含量的影响,因此我们首先应该排除掉GC含量对扩增子深度的影响。
数据amplicon 测序数据,处理后得到的每个amplicon的深度,每个amplicon的GC含量,每个amplicon的长度
先用loess进行曲线的拟合 gcCount.loess <- loess(log(RC+0.01)~GC,data=RC_DT,control = loess.control(surface = "direct"),degree=2)
画出拟合出来的曲线
predictions1<- predict (gcCount.loess,RC_DT$GC) #plot scatter and line plot(RC_DT$GC,log(RC_DT$RC+0.01),cex=0.1,xlab="GC Content",ylab=expression(paste("log(NRC"["lib"],"+0.01)",sep=""))) lines(RC_DT$GC,predictions1,col = "red")取残差,去除GC含量对深度的影响
#sustract the influence of GC resi <- log(RC_DT$RC+0.01)-predictions1 RC_DT$RC <- resi setkey(RC_DT,GC)此时RC_DT$RC就是normalize之后的RC
画图显示nomalize之后的RC,并将拟合的loess曲线和normalize之后的数据保存
当然,也想看一下amplicon 长度len 对RC的影响,不过影响不大
全部代码如下:
library(data.table) load("/home/ywliao/project/Gengyan/RC_DT.Rdata") ####loess GC vs RC#### gcCount.loess <- loess(log(RC+0.01)~GC,data=RC_DT,control = loess.control(surface = "direct"),degree=2) predictions1<- predict (gcCount.loess,RC_DT$GC) #plot scatter and line plot(RC_DT$GC,log(RC_DT$RC+0.01),cex=0.1,xlab="GC Content",ylab=expression(paste("log(NRC"["lib"],"+0.01)",sep=""))) lines(RC_DT$GC,predictions1,col = "red") #sustract the influence of GC resi <- log(RC_DT$RC+0.01)-predictions1 RC_DT$RC <- resi setkey(RC_DT,GC) #plot scatter and line using Norm GC data plot(RC_DT$GC,RC_DT$RC,cex=0.1,xlab="GC Content",ylab=expression("NRC"["GC"])) gcCount.loess <- loess(RC~GC,data=RC_DT,control = loess.control(surface = "direct"),degree=2) save(gcCount.loess,file="/home/ywliao/project/Gengyan/gcCount.loess.Robject") predictions2 <- predict(gcCount.loess,RC_DT$GC) lines(RC_DT$GC,predictions2,col="red") save(RC_DT,file="/home/ywliao/project/Gengyan/RC_DT.Rdata") ####loess len vs RC### setkey(RC_DT,Len) len.loess <- loess(RC_DT$RC~RC_DT$Len, control = loess.control(surface = "direct"),degree=2) predictions2<- predict (len.loess,RC_DT$Len) #plot scatter and line plot(RC_DT$Len,RC_DT$RC,cex=0.1,xlab="Length",ylab=expression(paste("log(RC"["GC"],"+0.01)",sep=""))) lines(RC_DT$Len,predictions2,col = "red")