Monday, August 31, 2015

I am going to make a gene tree of all the Trpc2 sequences to demonstrate how different the pseudogenes are from the conserved gene--the hypothesis being that the pseudogenes will have much longer branch lengths.

Running modelo
modelomatic_OSX Trpc2_modelomatic.phy bionj Trpc2_modelomatic_normal.txt 0 normal
---------------------------------------------------------------------------------------------
  ModelOMatic (v1.01 (release)).
A program for choosing substitutions models for phylogenetic inference.
Written by Simon Whelan.
Contributions from James Allen, Ben Blackburne and David Talavera.
---------------------------------------------------------------------------------------------
Data: <Trpc2_modelomatic.phy>: 118 sequences of length 489 (DataMatrix: 118 x 380)
Checking for sparse (>85% gaps) sequences
Starting with 118 ... after removal there are 118 sequences
Creating start tree ... 
Assertion failed: (0), function Error, file tools.cxx, line 235.
Unrecognised codon: TAG for data codon position 153 in sequence 52...Abort trap: 6
105-238:modelomatic loloyohe$ modelomatic_OSX Trpc2_modelomatic.phy bionj Trpc2_modelomatic_normal.txt 0 normal

---------------------------------------------------------------------------------------------
  ModelOMatic (v1.01 (release)).
A program for choosing substitutions models for phylogenetic inference.
Written by Simon Whelan.
Contributions from James Allen, Ben Blackburne and David Talavera.
---------------------------------------------------------------------------------------------
Data: <Trpc2_modelomatic.phy>: 118 sequences of length 489 (DataMatrix: 118 x 380)
Checking for sparse (>85% gaps) sequences
Starting with 118 ... after removal there are 118 sequences
Creating start tree ...  estimated using bionj (0.813652s)
Optimisation settings: normal 
Scanning for modelomatic.ini file ...
Amino acid models skipping rtREV, HIVb, HIVw
Codon models skipping F0, F1X4, F3X4, F64 done
Working with genetic code: Universal
>>> Doing model analysis <<< 
RY Done  (2.84812s)
NT Done  (65.4292s)
AA Done  (267.533s)...
Codon Done  (0.255969s)

Outputting results to <Trpc2_modelomatic_normal.txt>

Successful exit

For once, something worked. The model with the best fit is the K2P+4dG. This is the Kimura two-parameter + gamma with four site rates. A good summary of the nucleotide models can be found here. K2P means that the transitions (alpha) and transversions (beta) will have different rates, but that all nucleotides occur at the same frequency. This model can easily be implemented in GARLI. The K2P model is just a simpler HKY model--HKY allows for different transition and transversion rates, but also allows 4 parameters for nucleotide substitutions. I am using the GARLI on CIPRES. 

#everything is default except
Maximum hours to run=48
-searchreps=8
-bootstrapreps=0
-datatype=nucleotide
#to make it K2P make sure to change base statefrequencies
-ratematrix=HKY(2rate)
-statefrequencies=fixed
-invariant sites=none
#this is where the 4dG comes in from modelomatic
-ratehetmodel=gamma
-numratecats=4



ERROR: state frequencies specified as fixed, but no

        Garli block found in Trpc2_garli.phy!!

Oops, it choked. -statefrequencies should be set to equal:

#everything is default except
Maximum hours to run=48
-searchreps=8
-bootstrapreps=0
-datatype=nucleotide
#to make it K2P make sure to change base statefrequencies
-ratematrix=HKY(2rate)
-statefrequencies=equal
-invariant sites=none
#this is where the 4dG comes in from modelomatic
-ratehetmodel=gamma

-numratecats=4
From the best tree from GARLI:

Now onto testing if some branch lengths are significantly longer...

No comments:

Post a Comment