Laurel's Lab Notebook: python

Showing posts with label python. Show all posts

Monday, November 23, 2015

Find the best model of evolution for Trpc2 using modelomatic:
#For this analysis, all stop codons need to be removed.
#For troubleshooting modelOmatic, I wrote a more detailed post here.
#Navigate to the correct directory and run this command:

modelomatic_OSX Trpc2_alignment_noStops.phy bionj Trpc2_modelomatic_normal.txt 0 normal

Interestingly REV+4dG is the best model, despite it being a protein coding gene. This is essentially "GTR+G". The 4dG is showing the four rate categories (this is an arbitrary number) for the gamma parameter.

Find the "best" (ML) tree in GALI using search replicates:
#run this command
garli Trpc2_garli.conf
#Note how to implement GTR+G in GARLI. See here for more examples of other models.
#Here is my configuration file (Trpc2_garli.conf):
[general]
datafname = Trpc2_garli.nex
constraintfile = none
streefname = stepwise
attachmentspertaxon = 50
ofprefix = Trpc2.GTR4DG
randseed = -1
availablememory = 512
logevery = 10
saveevery = 100
refinestart = 1
outputeachbettertopology = 0
outputcurrentbesttopology = 0
enforcetermconditions = 1
genthreshfortopoterm = 20000
scorethreshforterm = 0.05
significanttopochange = 0.01
outputphyliptree = 0
outputmostlyuselessfiles = 0
writecheckpoints = 0
restart = 0
outgroup = 1
resampleproportion = 1.0
inferinternalstateprobs = 0
outputsitelikelihoods = 0
optimizeinputonly = 0
collapsebranches = 1

searchreps = 8
bootstrapreps = 0

[model1]
datatype = nucleotide
ratematrix = 6rate
statefrequencies = estimate
ratehetmodel = gamma
numratecats = 4
invariantsites = none

[master]
nindivs = 4
holdover = 1
selectionintensity = 0.5
holdoverpenalty = 0
stopgen = 5000000
stoptime = 5000000

startoptprec = 0.5
minoptprec = 0.01
numberofprecreductions = 10
treerejectionthreshold = 50.0
topoweight = 1.0
modweight = 0.05
brlenweight = 0.2
randnniweight = 0.1
randsprweight = 0.3
limsprweight = 0.6
intervallength = 100
intervalstostore = 5
limsprrange = 6
meanbrlenmuts = 5
gammashapebrlen = 1000
gammashapemodel = 1000
uniqueswapbias = 0.1
distanceswapbias = 1.0

Putting bootstrap values on GARLI:
You can bootstrap in GARLI, but to "put" the bootstraps on the nodes, you have to summarize the bootstrap results using an outside method. SumTrees.py in the Dendropy python package works well. I basically followed the exact instructions somebody wrote here.

Running bootstrap iterations in GARLI:
#I ran four independent runs simultaneously of 250 bootstrap iterations that will combine to = 1000.
#Here is my GARLI configuration file:
[general]
datafname = Trpc2_garli.nex
constraintfile = none
streefname = stepwise
attachmentspertaxon = 50
ofprefix = Trpc2.GTR4DG.boot
randseed = -1
availablememory = 512
logevery = 10
saveevery = 100
refinestart = 1
outputeachbettertopology = 0
outputcurrentbesttopology = 0
enforcetermconditions = 1
genthreshfortopoterm = 20000
scorethreshforterm = 0.05
significanttopochange = 0.01
outputphyliptree = 0
outputmostlyuselessfiles = 0
writecheckpoints = 0
restart = 0
outgroup = 1
resampleproportion = 1.0
inferinternalstateprobs = 0
outputsitelikelihoods = 0
optimizeinputonly = 0
collapsebranches = 1

searchreps = 1
bootstrapreps = 250

[model1]
datatype = nucleotide
ratematrix = 6rate
statefrequencies = estimate
ratehetmodel = gamma
numratecats = 4
invariantsites = none

[master]
nindivs = 4
holdover = 1
selectionintensity = 0.5
holdoverpenalty = 0
stopgen = 5000000
stoptime = 5000000

startoptprec = 0.5
minoptprec = 0.01
numberofprecreductions = 10
treerejectionthreshold = 20.0
topoweight = 1.0
modweight = 0.05
brlenweight = 0.2
randnniweight = 0.1
randsprweight = 0.3
limsprweight = 0.6
intervallength = 100
intervalstostore = 5
limsprrange = 6
meanbrlenmuts = 5
gammashapebrlen = 1000
gammashapemodel = 1000
uniqueswapbias = 0.1
distanceswapbias = 1.0

Installing dendropy
#We have "pip" set up on our server, making the python package easy to install:

sudo pip install -U dendropy

Running SumTrees
#place all of your bootstrap outputs and your best tree from GARLI in its own folder

cd /Volumes/Yango/Trpc2/garli/

mkdir sumtrees

cp Trpc2.GTR4DG.best.tre sumtrees/

cp Trpc2.GTR4DG.boot.boot.tre sumtrees/

cp Trpc2.GTR4DG.boot2.boot.tre sumtrees/

cp Trpc2.GTR4DG.boot3.boot.tre sumtrees/

cp Trpc2.GTR4DG.boot4.boot.tre sumtrees/

#from within the folder, run sumtrees.py

#your target tree is the "best" tree from your GARLI output

sumtrees.py --decimals=0 --percentages --no-analysis-metainformation --no-taxa-block Trpc2.GTR4DG.boot.boot.tre Trpc2.GTR4DG.boot2.boot.tre Trpc2.GTR4DG.boot3.boot.tre Trpc2.GTR4DG.boot4.boot.tre --target=Trpc2.GTR4DG.best.tre --output=supportOnBest.phy

Viewing tree with bootstraps

Now, open supportOnBest.phy in FigTree. Click on "node labels" and select "support" as the node label. These are your bootstrap values.

Friday, October 16, 2015

Installing pbh5tools on OSX yosemite

I need to assemble PacBio amplicons of olfactory receptors. One of the main challenges I have been having is that most of the PacBio tools (SMRT-analysis, Celara assembler) focus on assembling whole genomes. Using an approach from Larson, et al. (2014) in BMC Genomics, I am going to do quality filtering using pbh5tools that will filter the reads at a minimum of passes (e.g. 4; though my mean number of passes is quite high for the data (~20)).

To install pbh5tools, I dealt with a series of challenges on Yosemite:
Download from github.

cd pbh5tools-master/

sudo python setup.py install

I ran into this pesky error.

In file included from /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/include/numpy/ndarraytypes.h:1760:

/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: 

      "Using deprecated NumPy API, disable it by "          "#defining

      NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]

#warning "Using deprecated NumPy API, disable it by " \

^

In file included from /tmp/easy_install-_W_eCV/h5py-2.5.0/h5py/defs.c:279:

/tmp/easy_install-_W_eCV/h5py-2.5.0/h5py/api_compat.h:27:10: fatal error: 

      'hdf5.h' file not found

#include "hdf5.h"

^

1 warning and 1 error generated.

error: Setup script exited with error: command 'cc' failed with exit status 1

First I tries to install hdf5 tool packages and then export the path there, but that wasn't working.

From the setup.py, I see that it needs:
install_requires=[
'pbcore >= 0.8.0',
'numpy >= 1.6.0',
'h5py >= 1.3.0'
]

sudo pip install pbcore

sudo pip install numpy

sudo pip install h5py

Then,

sudo python setup.py install

...

Using /Library/Python/2.7/site-packages

Searching for pysam==0.8.3

Best match: pysam 0.8.3

pysam 0.8.3 is already the active version in easy-install.pth

Using /Library/Python/2.7/site-packages

Finished processing dependencies for pbh5tools==0.8.0

Voila!

Tuesday, August 12, 2014

Here is a blog that gives excellent Trinity instructions:

https://cartwrightlab.wikispaces.com/RNAseq+Assembly+in+Trinity

#for paired end reads, combine each side of pairs into one file

cat pair.DR004_Artibeus_Jamaicensis_MOE_ACAGTG_L001_R1* >> Arjam_MOE_R1.fq

cat pair.DR004_Artibeus_Jamaicensis_MOE_ACAGTG_L001_R2* >> Arjam_MOE_R2.fq

#run a trimming script

python ~/Scripts/q-trim.py Arjam_MOE_R1.fq Arjam_MOE_R1_trim.fq

#woops I run into this error:

importerror no module named screed

#it looks like I need the "screed" module for this; looks like I also need to update "setup tools"

#first install pip

sudo python get-pip.py

pip install ipython #not sure what this did

#still missing the screed module

sudo pip install setuptools --upgrade

sudo pip install git+https://github.com/ged-lab/screed.git

#okay seems to be working now

#trimmed with the default setting in which a minimum Qscore = 5

#minimum length = 30; lets see how this works

python ~/Scripts/q-trim.py Arjam_MOE_R1.fq Arjam_MOE_R1_trim.fq

python ~/Scripts/q-trim.py Arjam_MOE_R2.fq Arjam_MOE_R2_trim.fq

#now i need to assemble paired reads--extracting only the pairs using both.py

python ~/Scripts/both.py Arjam_MOE_R1_trim.fq Arjam_MOE_R2_trim.fq

#this creates two files ...R1_trim.fq.both and R2_trim.fq.both

#this script took about one hour to run

#finally time to run trinity!

Trinity.pl --seqType fq --left Arjam_MOE_R1_trim.fq.both --right Arjam_MOE_R2_trim.fq.both --CPU 4 --JM 20G --output trinity_output/

Everything is up and running now. Stay tuned.