Laurel's Lab Notebook: R

Showing posts with label R. Show all posts

Tuesday, September 16, 2014

Okay, I have finally accepted that my data is not fitting an Ornstein-Uhlenbeck model of evolution and it makes zero sense to be running the tests that I have been running. My alpha value is way too small for all of my traits compared to the height of my tree.

I have run the fitContinuous() function in the "geiger" package on the geometric mean.

Thursday, August 21, 2014

#prior lambda = 6
#for the geometric mean (body size)
x<-data$GM
names(x)<-rownames(data)
prior <- make.prior(tree, model = "OU", param=list("dalpha"=list(),"dsig2"=list(),"dtheta"=list(),
"dk"=list(lambda=6,kmax=length(tree$tip.label)*2-2),

"dloc"=list(min=0,max=1),
"dsb"=list(bmax=Inf,prob=tree$edge.length)))

> effectiveSize(combine$lnL)
var1
47696.89
> effectiveSize(combine$alpha)
var1
17668.65
> effectiveSize(combine$k)
var1
200.3831

http://andrewgelman.com/2010/10/29/could_someone_p/

#GM
#lambda=15
prior <- make.prior(tree, dists=list(dalpha="dlnorm", dsig2="dlnorm",dsb="dsb", dk="cdpois",
dtheta="dnorm"), param=list(dalpha=list(meanlog=-5, sdlog=2),
dsig2=list(meanlog=-1, sdlog=5), dk=list(lambda=15, kmax=200),
dsb=list(bmax=Inf,prob=tree$edge.length),
dtheta=list(mean=mean(x), sd=2)))

fit.1<-bayou.mcmc(tree, x, model = "OU",prior= prior, ngen = 500000)
fit.2<-bayou.mcmc(tree, x,model = "OU", prior= prior, ngen = 500000)
chain.1 <- load.bayou(fit.1, save.Rdata=FALSE, cleanup=TRUE)
chain.2 <- load.bayou(fit.2, save.Rdata=FALSE, cleanup=TRUE)

combine<-combine.chains(chain.1, chain.2, burnin.prop = 0.2)

> effectiveSize(combine$lnL)
var1
54733.55
> effectiveSize(combine$alpha)
var1
21377.18
> effectiveSize(combine$k)
var1
460.0945
> median(combine$alpha)
[1] 0.005785393
> median(combine$k)
[1] 1
> median(combine$lnL)
[1] -226.6656
> median(combine$sig2)
[1] 0.1995975
> median(combine$ntheta)
[1] 2
> median(combine$theta)

plotSimmap.mcmc(tree,combine, burnin = 0, fsize = 0.5,circle.lwd = 0.5 )

#pPC1, lambda = 6
x<-data$PC1
names(x)<-rownames(data)
prior <- make.prior(tree, model = "OU", param=list("dalpha"=list(),"dsig2"=list(),"dtheta"=list(),
"dk"=list(lambda=6,kmax=length(tree$tip.label)*2-2),
"dloc"=list(min=0,max=1),
"dsb"=list(bmax=Inf,prob=tree$edge.length)))
> effectiveSize(combine$lnL)
var1
77556.68
> effectiveSize(combine$alpha)
var1
750.1128
> effectiveSize(combine$k)
var1
1098.867

> median(combine$alpha)

[1] 0.01054099

> median(combine$k)

[1] 5

> median(combine$lnL)

[1] -232.5772

> median(combine$sig2)

[1] 0.2309304

> median(combine$ntheta)

[1] 6

#pPC1

#lambda=15
prior <- make.prior(tree, dists=list(dalpha="dlnorm", dsig2="dlnorm",dsb="dsb", dk="cdpois",
dtheta="dnorm"), param=list(dalpha=list(meanlog=-5, sdlog=2),
dsig2=list(meanlog=-1, sdlog=5), dk=list(lambda=15, kmax=200),
dsb=list(bmax=Inf,prob=tree$edge.length),
dtheta=list(mean=mean(x), sd=2)))

fit.1<-bayou.mcmc(tree, x, model = "OU",prior= prior, ngen = 500000, plot.freq = NULL)
fit.2<-bayou.mcmc(tree, x,model = "OU", prior= prior, ngen = 500000, plot.freq = NULL)
chain.1 <- load.bayou(fit.1, save.Rdata=FALSE, cleanup=TRUE)
chain.2 <- load.bayou(fit.2, save.Rdata=FALSE, cleanup=TRUE)

combine<-combine.chains(chain.1, chain.2, burnin.prop = 0.2)

> effectiveSize(combine$lnL)
var1
46687.59
> effectiveSize(combine$alpha)
var1
1882.031
> effectiveSize(combine$k)
var1
1916.088
> median(combine$alpha)
[1] 0.01097906
> median(combine$k)
[1] 6
> median(combine$lnL)
[1] -682.1394
> median(combine$sig2)
[1] 187.9879
> median(combine$ntheta)
[1] 7

Wednesday, July 2, 2014

Plans for today:

[1] Find missing data for high school student
[2] Figure out how to teach RaxML to a high school student (JW slides) COMPLETE
[3] Get Gazey-Staley algorithm set up in R and running. COMPLETE
-Okay, so we don't actually have to do this step, as its just a measure of cloning success (and since we didn't clone. We don't really need to measure this. What we do need to measure (but not right away, is the # of sequences (reads) v. # genes (unique contigs). This will tell us if we have sampled enough of the genome for the OR genes given our degenerate primers.

Anyways, to download the R code for the Gazey-Staley algorithm along with a heterogeneity test (with very good documentation), visit:
http://batlab.ucd.ie/~spuechmaille/

[4] Figure out most recent problem with babbler paper.
[5] Set up ORA pipeline COMPLETE
____________________________________________
To install ORA (after having install BioPerl, HMMER, and FASTA), download the .zip file from:
http://search.cpan.org/~ceratites/ora-1.9.1/lib/Bio/ORA.pm#DRIVER_SCRIPT

Unzip and navigate to the ORA folder.

#run the perl script to make the "makefile"

perl Makefile.PL

sudo make install

perl Build.PL

WOOOOHOOOO got it working.

There are two main issues (assuming you have everything installed).

1) You must run the perl script in the /ora-*/scripts folder because it must access the .hmm files

2) When you install ORA, it includes the older version of HMMER. The error message you get when you run "or.pl", is:

...

binary auxfiles are in an outdated HMMER format (3/b); please hmmpress your HMM file again

To fix this, do the following (assuming you have HMMER installed).

-In the /ora-*/scripts folder, there should be several files related to HMMER

Delete the following: or.hmm.h3f or.hmm.h3i or.hmm.h3m or.hmm.h3p

On the command line, type:

hmmpress or.hmm

That should fix things.

To run the ORA:

perl or.pl --sequence ~/Documents/Analyses/Carrollia_OR/GPC-trim.deg.fasta

#still playing with all of the fun parameters

Now I need to trim the data of a certain length.

By the end of the weekend, it would be great to make a tree from the sequences.

Saturday, June 15, 2013

WOOT--figured out how to date a random sample of my posterior of trees. Super straightforward:

#read in tree file to make multiphylo R object--ignoroing the first 25% as burn-in
tree.list<-read.tree(file.choose(),keep.multi=TRUE, skip = 3374)
#take random sample of trees
sample.list<-sample(tree.list,500)

#date the trees using lapply() and parameters established from Moyle, et al (2012)
newmulti<-lapply(unclass(sample.list),chronopl,lambda=0.5,age.min=c(8.4, 13, 27.1), age.max=c(11.4,17, 37.3), node = c(336, 297, 293))

#rename the object's class

class(newmulti)<-"multiPhylo"

As per Revell's comment on this post: http://comments.gmane.org/gmane.comp.lang.r.phylo/2773