Showing posts with label trpc2. Show all posts
Showing posts with label trpc2. Show all posts

Wednesday, December 2, 2015

Running RELAX.


We are looking for relaxed selection in Trpc2 of several lineages of bats that have reduced morphology of vomeronasal system organs or brain regions.

To run RELAX:
RELAX is relatively recent and you have to download the most recent batch files for HyPhy.
Start HyPhy.
Open->Open Batch File...
Navigate to Applications/hyphy/res/TemplateBatchFiles/
Select RELAX.bf
It will automatically prompt you to select your data/alignment file.
Then select your tree file.
Make sure the correct number of Ts, Rs, and Us match.
It should hopefully begin, assuming you have your tree set up correctly and your alignment does not have any stop codons AND it is in frame.

Something VERY critical if you are running it in the GUI...you must select the test branches directly when it asks for it, not just when you put them in the tree. Otherwise it estimates the reference branches as the test branches and vice versa.

RELAX1:
Alignment: bat_data_match.nex
Tree: bat_tree_match_RELAX1.tre
(((((((((((((((((((((Artibeus_jamaicensis{R}:2.431373248,Artibeus_obscurus{R}:2.431373248){R}:0.3588412223,Artibeus_fimbriatus{R}:2.79021447){R}:0.9561230126,(Artibeus_intermedius{R}:0.6313580609,Artibeus_lituratus{R}:0.6313580609){R}:3.114979422){R}:1.424889238,Artibeus_inopinatus{R}:5.171226721){R}:0.9218212476,Artibeus_concolor{R}:6.093047969){R}:2.392412914,((((Artibeus_watsoni{R}:2.760434077,Artibeus_incomitatus{R}:2.760434077){R}:1.940812275,Artibeus_phaeotis{R}:4.701246352){R}:0.9218549563,Artibeus_cinereus{R}:5.623101308){R}:0.8637586984,(Artibeus_glaucus{R}:3.316302056,Artibeus_gnomus{R}:3.316302056){R}:3.170557951){R}:1.998600876){R}:2.413693209,(((Ariteus_flavescens{R}:2.487096883,Ardops_nichollsi{R}:2.487096883){R}:2.828400699,Phyllops_falcatus{R}:5.315497583){R}:0.8401618707,((Pygoderma_bilabiatum{R}:3.602361855,Sphaeronycteris_toxophyllum{R}:3.602361855){R}:1.181679598,Centurio_senex{R}:4.784041453){R}:1.371618001){R}:4.743494639){R}:1.13714402,Ectophylla_alba{R}:12.03629811){R}:0.9420299332,Enchisthenes_hartii{R}:12.97832805){R}:1.825921019,((((((((Platyrrhinus_nigellus{R}:2.504001415,Platyrrhinus_dorsalis{R}:2.504001415){R}:0.4677903585,(Platyrrhinus_infuscus{R}:2.110403375,Platyrrhinus_aurarius{R}:2.110403375){R}:0.8613883988){R}:1.095202405,Platyrrhinus_albericoi{R}:4.066994179){R}:2.177118584,Platyrrhinus_lineatus{R}:6.244112763){R}:1.779671702,Vampyrodes_caraccioli{R}:8.023784464){R}:2.695962001,(Vampyressa_thyone{R}:7.747059999,Mesophylla_macconnelli{R}:7.747059999){R}:2.972686466){R}:0.777302653,(((Chiroderma_villosum{R}:1.968379874,Chiroderma_improvisum{R}:1.968379874){R}:1.432916649,Chiroderma_doriae{R}:3.401296523){R}:5.790148924,(Vampyressa_brocki{R}:7.444560681,Vampyressa_bidens{R}:7.444560681){R}:1.746884766){R}:2.305603671){R}:0.737544275,Uroderma_magnirostrum{R}:12.23459339){R}:2.569655671){R}:2.978950491,(((Sturnira_bogotensis{R}:4.351270017,Sturnira_tildae{R}:4.351270017){R}:0.6450817233,Sturnira_magna{R}:4.99635174){R}:0.4749192885,(Sturnira_oporaphilum{R}:0.9163878452,Sturnira_lilium{R}:0.9163878452){R}:4.554883183){R}:12.31192853){R}:1.896666128,Rhinophylla_fischerae{R}:19.67986568){R}:1.622496147,(((((Carollia_perspicillata{R}:2.166884003,Carollia_brevicauda{R}:2.166884003){R}:0.7022808564,Carollia_sowelli{R}:2.869164859){R}:1.049593577,Carollia_subrufa{R}:3.918758436){R}:4.394363478,Carollia_castanea{R}:8.313121914){R}:11.76400773,(Glyphonycteris_sylvestris{R}:15.43079994,Trinycteris_nicefori{R}:15.43079994){R}:4.6463297){R}:1.225232188){R}:1.57140971,((((Lonchophylla_hesperia{R}:9.371624678,Lionycteris_spurrelli{R}:9.371624678){R}:2.706833404,Platalina_genovensium{R}:12.07845808){R}:1.204240126,Lonchophylla_chocoana{R}:13.28269821){R}:2.390290187,Lonchophylla_mordax{R}:15.6729884){R}:7.200783145){R}:1.155625037,((((((Glossophaga_longirostris{R}:5.673316434,Glossophaga_commissarisi{R}:5.673316434){R}:1.473493579,Glossophaga_soricina{R}:7.146810013){R}:4.41746123,(Leptonycteris_nivalis{R}:1.825719598,Leptonycteris_curasoae{R}:1.825719598){R}:9.738551645){R}:6.408254338,(Brachyphylla_pumila{T}:1.377027909,Brachyphylla_cavernarum{T}:1.377027909){T}:16.59549767){R}:1.44044124,(((Musonycteris_harrisoni{T}:1.250127196,Choeronycteris_mexicana{T}:1.250127196){T}:4.862545039,Choeroniscus_godmani{T}:6.112672234){T}:10.9120772,((Anoura_latidens{R}:3.779427833,Anoura_geoffroyi{R}:3.779427833){R}:2.83493221,Anoura_caudifer{R}:6.614360043){R}:10.41038939){R}:2.388217391){R}:3.771706573,(Lonchorhina_aurita{R}:8.795556441,Lonchorhina_inusitata{R}:8.795556441){R}:14.38911695){R}:0.8447231842){R}:0.9614243908,(((((Lophostoma_evotis{R}:2.279691562,Lophostoma_silvicolaFG{R}:2.279691562){R}:7.690705428,Lophostoma_brasiliense{R}:9.97039699){R}:7.599298001,(Phyllostomus_elongatus{R}:8.119582824,Phyllostomus_discolor{R}:8.119582824){R}:9.450112167){R}:1.521869862,(Tonatia_saurophila{R}:4.502755885,Tonatia_bidens{R}:4.502755885){R}:14.58880897){R}:3.674523748,(Chrotopterus_auritus{R}:19.15333438,Mimon_cozumelae{R}:19.15333438){R}:3.61275422){R}:2.224732367){R}:2.433669076,((Diaemus_youngi{R}:12.39244932,Desmodus_rotundus{R}:12.39244932){R}:7.835663972,Diphylla_ecaudata{R}:20.22811329){R}:7.196376756){R}:0.8214752215,(((Micronycteris_microtis{R}:9.018674346,Micronycteris_hirsuta{R}:9.018674346){R}:5.324423535,(Micronycteris_minuta{R}:4.882043938,Micronycteris_schmidtorum{R}:4.882043938){R}:9.461053943){R}:8.513447254,Lampronycteris_brachyotis{R}:22.85654514){R}:5.389420131){R}:7.944937657,((((Pteronotus_quadridens{T}:6.670886247,Pteronotus_macleayii{T}:6.670886247){T}:4.04633203,Pteronotus_davyi{T}:10.71721828){T}:3.138003269,Pteronotus_parnellii{T}:13.85522155){T}:18.91999925,Mormoops_megalophylla{T}:32.7752208){T}:3.415682128){R}:5.869690062,Thyroptera_lavali{T}:42.06059299){R}:13.44742223,((((((((Miniopterus_aelleni{R}:1.299615,Miniopterus_petersoni{R}:1.299615){R}:4.270025,Miniopterus_gleni{R}:5.569641){R}:2.003838,Miniopterus_sororculus{R}:7.573479){R}:2.031566,Miniopterus_minor{R}:9.605045){R}:14.889341,(Miniopterus_fuliginosus{R}:17.236618,Miniopterus_schreibersii{R}:17.236618){R}:7.257769){R}:27.640976,Myotis_elegans{T}:52.13536){R}:1.662856,Molossus_sinaloae{T}:53.798217){R}:0.976298,Chilonatalus_micropus{T}:54.774513){R}:0.873379);


Load this file into HyPhy-vision
Results from the first run (Important results are in the .json file):

Test for selection relaxation (K = 0.43) was significant (p = 0.0000, LR = 46.62)

  • Modellog L# par.AICcTime to fitLtreeBranch setω1ω2ω3
    Partitioned MG94xREV-4116.852188675.524 min. 44 sec.1.33Reference0.102 (100%)
    Test0.367 (100%)
    General Descriptive-4024.514228915.0415 min. 55 sec.3.99All0.156 (100%)0.601 (0.0%)10.7 (0.0%)
    Null-4127.442218702.867 min. 9 sec.4.05Reference0.139 (99%)0.141 (0.48%)53.5 (0.020%)
    Test0.139 (99%)0.141 (0.48%)53.5 (0.020%)
    Alternative-4104.132228658.3034 min. 33 sec.3.99Reference0.0563 (5.6%)0.0763 (94%)3.83 (0.57%)
    Test0.290 (5.6%)0.331 (94%)1.78 (0.57%)
    Partitioned Exploratory-4104.052268666.3612 min. 26 sec.3.99Reference0.0231 (5.9%)0.0743 (93%)3.16 (0.84%)
    Test0.290 (0.0%)0.334 (100%)1.42 (0.0%)









Model (click to enlarge): Alternative (A), General Descriptive (B)


ω distributions under the Alternative model

Test branches are shown in blue and reference branches are shown in red

ω distributions under the Null model

Test branches are shown in blue and reference branches are shown in red

ω distributions under the Partitioned Exploratory model

Test branches are shown in blue and reference branches are shown in red

ω distributions under the Partitioned MG94xREV model

Test branches are shown in blue and reference branches are shown in red


The x-axis of the graphs comes out horribly, but other than that, I'm pleased with the visualization package.

Monday, November 23, 2015

Find the best model of evolution for Trpc2 using modelomatic:
#For this analysis, all stop codons need to be removed.
#For troubleshooting modelOmatic, I wrote a more detailed post here.
#Navigate to the correct directory and run this command:
modelomatic_OSX Trpc2_alignment_noStops.phy bionj Trpc2_modelomatic_normal.txt 0 normal

Interestingly REV+4dG is the best model, despite it being a protein coding gene. This is essentially "GTR+G". The 4dG is showing the four rate categories (this is an arbitrary number) for the gamma parameter.

Find the "best" (ML) tree in GALI using search replicates:
#run this command
garli Trpc2_garli.conf
#Note how to implement GTR+G in GARLI. See here for more examples of other models.
#Here is my configuration file (Trpc2_garli.conf):
[general]
datafname = Trpc2_garli.nex
constraintfile = none
streefname = stepwise
attachmentspertaxon = 50
ofprefix = Trpc2.GTR4DG
randseed = -1
availablememory = 512
logevery = 10
saveevery = 100
refinestart = 1
outputeachbettertopology = 0
outputcurrentbesttopology = 0
enforcetermconditions = 1
genthreshfortopoterm = 20000
scorethreshforterm = 0.05
significanttopochange = 0.01
outputphyliptree = 0
outputmostlyuselessfiles = 0
writecheckpoints = 0
restart = 0
outgroup = 1
resampleproportion = 1.0
inferinternalstateprobs = 0
outputsitelikelihoods = 0
optimizeinputonly = 0
collapsebranches = 1

searchreps = 8
bootstrapreps = 0

[model1]
datatype = nucleotide
ratematrix = 6rate
statefrequencies = estimate
ratehetmodel = gamma
numratecats = 4
invariantsites = none

[master]
nindivs = 4
holdover = 1
selectionintensity = 0.5
holdoverpenalty = 0
stopgen = 5000000
stoptime = 5000000

startoptprec = 0.5
minoptprec = 0.01
numberofprecreductions = 10
treerejectionthreshold = 50.0
topoweight = 1.0
modweight = 0.05
brlenweight = 0.2
randnniweight = 0.1
randsprweight = 0.3
limsprweight =  0.6
intervallength = 100
intervalstostore = 5
limsprrange = 6
meanbrlenmuts = 5
gammashapebrlen = 1000
gammashapemodel = 1000
uniqueswapbias = 0.1
distanceswapbias = 1.0

Putting bootstrap values on GARLI:
You can bootstrap in GARLI, but to "put" the bootstraps on the nodes, you have to summarize the bootstrap results using an outside method. SumTrees.py in the Dendropy python package works well. I basically followed the exact instructions somebody wrote here.

Running bootstrap iterations in GARLI:
#I ran four independent runs simultaneously of 250 bootstrap iterations that will combine to = 1000.
#Here is my GARLI configuration file:
[general]
datafname = Trpc2_garli.nex
constraintfile = none
streefname = stepwise
attachmentspertaxon = 50
ofprefix = Trpc2.GTR4DG.boot
randseed = -1
availablememory = 512
logevery = 10
saveevery = 100
refinestart = 1
outputeachbettertopology = 0
outputcurrentbesttopology = 0
enforcetermconditions = 1
genthreshfortopoterm = 20000
scorethreshforterm = 0.05
significanttopochange = 0.01
outputphyliptree = 0
outputmostlyuselessfiles = 0
writecheckpoints = 0
restart = 0
outgroup = 1
resampleproportion = 1.0
inferinternalstateprobs = 0
outputsitelikelihoods = 0
optimizeinputonly = 0
collapsebranches = 1

searchreps = 1
bootstrapreps = 250

[model1]
datatype = nucleotide
ratematrix = 6rate
statefrequencies = estimate
ratehetmodel = gamma
numratecats = 4
invariantsites = none

[master]
nindivs = 4
holdover = 1
selectionintensity = 0.5
holdoverpenalty = 0
stopgen = 5000000
stoptime = 5000000

startoptprec = 0.5
minoptprec = 0.01
numberofprecreductions = 10
treerejectionthreshold = 20.0
topoweight = 1.0
modweight = 0.05
brlenweight = 0.2
randnniweight = 0.1
randsprweight = 0.3
limsprweight =  0.6
intervallength = 100
intervalstostore = 5
limsprrange = 6
meanbrlenmuts = 5
gammashapebrlen = 1000
gammashapemodel = 1000
uniqueswapbias = 0.1
distanceswapbias = 1.0

Installing dendropy
#We have "pip" set up on our server, making the python package easy to install:
sudo pip install -U dendropy

Running SumTrees
#place all of your bootstrap outputs and your best tree from GARLI in its own folder

cd /Volumes/Yango/Trpc2/garli/

mkdir sumtrees
cp Trpc2.GTR4DG.best.tre sumtrees/
cp Trpc2.GTR4DG.boot.boot.tre sumtrees/
cp Trpc2.GTR4DG.boot2.boot.tre sumtrees/
cp Trpc2.GTR4DG.boot3.boot.tre sumtrees/
cp Trpc2.GTR4DG.boot4.boot.tre sumtrees/

#from within the folder, run sumtrees.py
#your target tree is the "best" tree from your GARLI output

sumtrees.py --decimals=0 --percentages --no-analysis-metainformation --no-taxa-block Trpc2.GTR4DG.boot.boot.tre Trpc2.GTR4DG.boot2.boot.tre Trpc2.GTR4DG.boot3.boot.tre Trpc2.GTR4DG.boot4.boot.tre --target=Trpc2.GTR4DG.best.tre --output=supportOnBest.phy

Viewing tree with bootstraps
Now, open supportOnBest.phy in FigTree. Click on "node labels" and select "support" as the node label. These are your bootstrap values.

Monday, August 31, 2015

I am going to make a gene tree of all the Trpc2 sequences to demonstrate how different the pseudogenes are from the conserved gene--the hypothesis being that the pseudogenes will have much longer branch lengths.

Running modelo
modelomatic_OSX Trpc2_modelomatic.phy bionj Trpc2_modelomatic_normal.txt 0 normal
---------------------------------------------------------------------------------------------
  ModelOMatic (v1.01 (release)).
A program for choosing substitutions models for phylogenetic inference.
Written by Simon Whelan.
Contributions from James Allen, Ben Blackburne and David Talavera.
---------------------------------------------------------------------------------------------
Data: <Trpc2_modelomatic.phy>: 118 sequences of length 489 (DataMatrix: 118 x 380)
Checking for sparse (>85% gaps) sequences
Starting with 118 ... after removal there are 118 sequences
Creating start tree ... 
Assertion failed: (0), function Error, file tools.cxx, line 235.
Unrecognised codon: TAG for data codon position 153 in sequence 52...Abort trap: 6
105-238:modelomatic loloyohe$ modelomatic_OSX Trpc2_modelomatic.phy bionj Trpc2_modelomatic_normal.txt 0 normal

---------------------------------------------------------------------------------------------
  ModelOMatic (v1.01 (release)).
A program for choosing substitutions models for phylogenetic inference.
Written by Simon Whelan.
Contributions from James Allen, Ben Blackburne and David Talavera.
---------------------------------------------------------------------------------------------
Data: <Trpc2_modelomatic.phy>: 118 sequences of length 489 (DataMatrix: 118 x 380)
Checking for sparse (>85% gaps) sequences
Starting with 118 ... after removal there are 118 sequences
Creating start tree ...  estimated using bionj (0.813652s)
Optimisation settings: normal 
Scanning for modelomatic.ini file ...
Amino acid models skipping rtREV, HIVb, HIVw
Codon models skipping F0, F1X4, F3X4, F64 done
Working with genetic code: Universal
>>> Doing model analysis <<< 
RY Done  (2.84812s)
NT Done  (65.4292s)
AA Done  (267.533s)...
Codon Done  (0.255969s)

Outputting results to <Trpc2_modelomatic_normal.txt>

Successful exit

For once, something worked. The model with the best fit is the K2P+4dG. This is the Kimura two-parameter + gamma with four site rates. A good summary of the nucleotide models can be found here. K2P means that the transitions (alpha) and transversions (beta) will have different rates, but that all nucleotides occur at the same frequency. This model can easily be implemented in GARLI. The K2P model is just a simpler HKY model--HKY allows for different transition and transversion rates, but also allows 4 parameters for nucleotide substitutions. I am using the GARLI on CIPRES. 

#everything is default except
Maximum hours to run=48
-searchreps=8
-bootstrapreps=0
-datatype=nucleotide
#to make it K2P make sure to change base statefrequencies
-ratematrix=HKY(2rate)
-statefrequencies=fixed
-invariant sites=none
#this is where the 4dG comes in from modelomatic
-ratehetmodel=gamma
-numratecats=4



ERROR: state frequencies specified as fixed, but no

        Garli block found in Trpc2_garli.phy!!

Oops, it choked. -statefrequencies should be set to equal:

#everything is default except
Maximum hours to run=48
-searchreps=8
-bootstrapreps=0
-datatype=nucleotide
#to make it K2P make sure to change base statefrequencies
-ratematrix=HKY(2rate)
-statefrequencies=equal
-invariant sites=none
#this is where the 4dG comes in from modelomatic
-ratehetmodel=gamma

-numratecats=4
From the best tree from GARLI:

Now onto testing if some branch lengths are significantly longer...

Thursday, June 25, 2015

PAML! Oh, good fun.

The worst part about PAML is getting ready to run PAML.

Be sure to keep organized. To run PAML, navigate to /Applications/paml4.8/bin/. There should be all the different PAML binaries listed.


For the null model:
bash-3.2$ sudo ./codeml codeml.ctl 


Tree file with two groups: group 0 is taxa with 
(((((((((((((((((((((Artibeus_fimbriatus,(Artibeus_jamaicensis,Artibeus_obscurus)),Artibeus_inopinatus),Artibeus_concolor),((((Artibeus_watsoni,Artibeus_incomitatus),Artibeus_phaeotis),Artibeus_cinereus),(Artibeus_glaucus,Artibeus_gnomus))),((Phyllops_falcatus,(Ariteus_flavescens,Ardops_nichollsi)),((Pygoderma_bilabiatum,Sphaeronycteris_toxophyllum),Centurio_senex))),Ectophylla_alba),Enchisthenes_hartii),((((((((Platyrrhinus_nigellus,Platyrrhinus_dorsalis),(Platyrrhinus_infuscus,Platyrrhinus_aurarius)),Platyrrhinus_albericoi),Platyrrhinus_lineatus),Vampyrodes_caraccioli),(Vampyressa_thyone,Mesophylla_macconnelli)),(((Chiroderma_villosum,Chiroderma_improvisum),Chiroderma_doriae),(Vampyressa_brocki,Vampyressa_bidens))),Uroderma_magnirostrum)),(((Sturnira_bogotensis,Sturnira_tildae),Sturnira_magna),(Sturnira_oporaphilum,Sturnira_lilium))),Rhinophylla_fischerae),(((((Carollia_perspicillata,Carollia_brevicauda),Carollia_sowelli),Carollia_subrufa),Carollia_castanea),(Glyphonycteris_sylvestris,Trinycteris_nicefori))),((((Lonchophylla_hesperia,Lionycteris_spurrelli),Platalina_genovensium),Lonchophylla_chocoana),Lonchophylla_mordax)),((((((Glossophaga_longirostris,Glossophaga_commissarisi),Glossophaga_soricina),Leptonycteris_curasoae),((Phyllonycteris_poeyi,(Erophylla_sezekorni,Erophylla_bombifrons)),(Brachyphylla_pumila,Brachyphylla_cavernarum))$1),((Musonycteris_harrisoni,Choeroniscus_godmani),((Anoura_latidens,Anoura_geoffroyi),Anoura_caudifer))),(Lonchorhina_aurita,Lonchorhina_inusitata))),(((Phyllostomus_discolor,(Lophostoma_brasiliense,(Lophostoma_evotis,Lophostoma_silvicola))),(Tonatia_saurophila,Tonatia_bidens)),Mimon_cozumelae)),((Diaemus_youngi,Desmodus_rotundus),Diphylla_ecaudata)),((Micronycteris_hirsuta,(Micronycteris_minuta,Micronycteris_schmidtorum)),Lampronycteris_brachyotis)),(((Pteronotus_davyi,Pteronotus_quadridens),Pteronotus_parnellii),(Mormoops_megalophylla,Mormoops_blainvillei)$1)),(Noctilio_leporinus,Furipterus_horrens)$1),(Thyroptera_lavali,Thyroptera_tricolor)$1),Tadarida_brasilense#1,Chilonatalus_micropus#1),Cormura_brevirostris#1,((Megaderma_lyra,(Rhinolophus_ferrumequinum,Hipposideros_pratti)),((Acerodon_celebensis,Eonycteris_spelaea),Cynopterus_sphinx))$1);

codeml.CladeC.ctl
      seqfile = bat_data_match.phylip * sequence data filename
     treefile = bat_tree_match_cladeModelC_2groups.tre      * tree structure file name
      outfile = trpc2cladeModelC_2groups_0point1.txt           * main result file name

        noisy = 9  * 0,1,2,3,9: how much rubbish on the screen
      verbose = 0  * 0: concise; 1: detailed, 2: too much
      runmode = 0  * 0: user tree

      seqtype = 1  * 1:codons; 2:AAs; 3:codons-->AAs
    CodonFreq = 2  * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table
        clock = 0  * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis

  * Model C: model = 3  NSsites = 2 
    * Model D: model = 3  NSsites = 3

        model = 3  * 3 = clade models
      NSsites = 2  * choose "2" or "3"
 
        icode = 0  * 0:universal code; 1:mammalian mt; 2-10:see below

    fix_kappa = 0  * 1: kappa fixed, 0: kappa to be estimated
        kappa = 2.5  * initial or fixed kappa

    fix_omega = 0  * 1: omega or omega_1 fixed, 0: estimate 
        omega = 0.1 * initial or fixed omega, for codons or codon-based AAs

    fix_alpha = 1  * 0: estimate gamma shape parameter; 1: fix it at alpha
        alpha = 0. * initial or fixed alpha, 0:infinity (constant rate)

        ncatG = 3  * # of categories in dG of NSsites models

        getSE = 0  * 0: don't want them, 1: want S.E.s of estimates
 RateAncestor = 0  * (0,1,2): rates (alpha>0) or ancestral states (1 or 2)

   Small_Diff = .2e-6
    cleandata = 0  * remove sites with ambiguity data (1:yes, 0:no)?



We are ready to go. Run the control file.
bash-3.2$ sudo ./codeml codeml.CladeC.ctl