Friday, May 1, 2015

I am using modelOmatic to determine which models best fit the sequences, and it include comparisons of codon models. Installing and running modelOmatic. For awhile I thought I was having Yosemite issues but because they are all binaries, everything seems to checking out okay. Lots of "duh" moments were had.

To install on Mac OSX Yosemite:
[1] Download the binaries and move to where you store your programs.
mv ~/Downloads/modelomatic_OSX /usr/local/bin/
[2] Set modelOmatic into your path.
vi .bash_profile  #in your home directory, this opens up your bash profile
#press "i" to edit and then paste this into your .bash_profile
export PATH=/usr/local/bin/modelomatic_OSX:$PATH 
#press ESC, then type "wq!"; restart the terminal
[3] ModelOmatic requires a tree file to run. It can make a NJ tree if you have bioNJ set up. Download the binaries for bioNJ here.
[4] Move the bioNJ into where you store your programs. Set your path. You have to rename your path in lower case so that modelOmatic can read it.
mv ~/Downloads/BIONJ/ /usr/local/bin/ 
mv /usr/local/bin/BIONJ/BIONJ /usr/local/bin/BIONJ/bionj

#add to your path variable
vi .bash_profile
export PATH=/usr/local/bin/BIONJ/BIONJ:$PATH
export PATH=/usr/local/bin/BIONJ:$PATH

The format for input 
./ModelOMatic <data> bionj <output> <genetic_code> fast
I am running it as modelomatic_OSX because it is in my path. 
Using the bionj option tell modelOmatic to make its own tree using bioNJ.
Specify in <output> slot where to put your output folder.
The <genetic_code> has been tricky. It can choke and tell you that you have stop codons if you don't use the correct number:
     0:  Universal code
     1:  Vertebrate mt
     2:  Yeast mt
     3:  Mould mt
     4:  Invertebrate mt
     5:  Ciliate nuclear
     6:  Echinoderm mt
     7:  Euplotid mt
     8:  Alternative yeast nuclear
     9:  Ascidian mt
     10: Blepharisma nuclear
     11: Everything codes (64 character state-space)
My adviser was having trouble but then realized she was using cytb so it actually should have been "1" instead of "0". 
ModelOmatic uses .phylip file. Allegedly can use other formats but I have not tried. However, it actually seems ridiculously flexible in reading a .phylip file.
OR_alignments loloyohe$ modelomatic_OSX Phyllos_OR6_Funct_tAlign.phylip bionj test.txt 1

---------------------------------------------------------------------------------------------
  ModelOMatic (v1.01 (release)).
A program for choosing substitutions models for phylogenetic inference.
Written by Simon Whelan.
Contributions from James Allen, Ben Blackburne and David Talavera.
---------------------------------------------------------------------------------------------
Data: <Phyllos_OR6_Funct_tAlign.phylip>: 35 sequences of length 729 (DataMatrix: 35 x 570)
Checking for sparse (>85% gaps) sequences
Starting with 35 ... after removal there are 35 sequences
Creating start tree ... 
Assertion failed: (0), function Error, file tools.cxx, line 235.
Unrecognised codon: AGA for data codon position 241 in sequence 1...Abort trap: 6
We had stop codons. Okay, so I translated the sequences, looked for where the stop codons were, and then I replaced them in the original .phylip file using "---".  Note in the future that all .phylip sequence files in /Volumes/Yango/Hayden_ORs/OR_alignments have the stop codon replaced, while the .nex files have the original sequence alignments. Lets try to get it running...
modelomatic_OSX  /Volumes/Yango/Hayden_batORs/OR_alignments/Phyllos_OR4_Funct_tAlign.phylip bionj /Volumes/Yango/Hayden_batORs/Phyllos_OR4_Funct_tAlign_mOm_results.txt 0 fast
---------------------------------------------------------------------------------------------
  ModelOMatic (v1.01 (release)).
A program for choosing substitutions models for phylogenetic inference.
Written by Simon Whelan.
Contributions from James Allen, Ben Blackburne and David Talavera.
---------------------------------------------------------------------------------------------
Data: </Volumes/Yango/Hayden_batORs/OR_alignments/Phyllos_OR4_Funct_tAlign.phylip>: 127 sequences of length 753 (DataMatrix: 127 x 666)
Checking for sparse (>85% gaps) sequences
Starting with 127 ... after removal there are 127 sequences
Creating start tree ...  estimated using bionj (1.15646s)
Optimisation settings: fast 
Scanning for modelomatic.ini file ... done
Working with genetic code: Universal
>>> Doing model analysis <<< 
RY Done  (3.56514s)
NT Done  (10.6517s)
AA Done  (71.2184s).........
Codon Done  (232.733s)

Outputting results to <Phyllos_OR4_Funct_tAlign_mOm_results.txt>
Successful exit

####now for the others
modelomatic_OSX /Volumes/Yango/Hayden_batORs/OR_alignments/Phyllos_OR6_Funct_tAlign.phylip bionj /Volumes/Yango/Hayden_batORs/Phyllos_OR6_Funct_tAlign_mOm_results.txt 0 fast

modelomatic_OSX  /Volumes/Yango/Hayden_batORs/OR_alignments/Phyllos_OR10_Funct_tAlign.phylip bionj /Volumes/Yango/Hayden_batORs/Phyllos_OR10_Funct_tAlign_mOm_results.txt 0 fast

modelomatic_OSX  /Volumes/Yango/Hayden_batORs/OR_alignments/Phyllos_OR51_Funct_tAlign.phylip bionj /Volumes/Yango/Hayden_batORs/Phyllos_OR51_Funct_tAlign_mOm_results.txt 0 fast


Had a hiccup with this data:
modelomatic_OSX  /Volumes/Yango/Hayden_batORs/OR_alignments/Phyllos_OR2_13_Funct_tAlign.phylip bionj /Volumes/Yango/Hayden_batORs/model_o_matic/Phyllos_OR2_13_Funct_tAlign_mOm_results.txt 0 normal

---------------------------------------------------------------------------------------------
  ModelOMatic (v1.01 (release)).
A program for choosing substitutions models for phylogenetic inference.
Written by Simon Whelan.
Contributions from James Allen, Ben Blackburne and David Talavera.
---------------------------------------------------------------------------------------------
Data: </Volumes/Yango/Hayden_batORs/OR_alignments/Phyllos_OR2_13_Funct_tAlign.phylip>: 264 sequences of length 1530 (DataMatrix: 264 x 943)
Checking for sparse (>85% gaps) sequences
Starting with 264 ... after removal there are 264 sequences
Creating start tree ...  estimated using bionj (4.68751s)
Optimisation settings: normal 
Scanning for modelomatic.ini file ... done
Working with genetic code: Universal
>>> Doing model analysis <<< 

Broken Prob(): -0.000104812

modelomatic_OSX  /Volumes/Yango/Hayden_batORs/OR_alignments/Phyllos_OR137_Funct_tAlign.phylip bionj /Volumes/Yango/Hayden_batORs/Phyllos_OR137_Funct_tAlign_mOm_results.txt 0 fast

---------------------------------------------------------------------------------------------
  ModelOMatic (v1.01 (release)).
A program for choosing substitutions models for phylogenetic inference.
Written by Simon Whelan.
Contributions from James Allen, Ben Blackburne and David Talavera.
---------------------------------------------------------------------------------------------
Data: </Volumes/Yango/Hayden_batORs/OR_alignments/Phyllos_OR137_Funct_tAlign.phylip>: 452 sequences of length 804 (DataMatrix: 452 x 766)
Checking for sparse (>85% gaps) sequences
Starting with 452 ... after removal there are 452 sequences
Creating start tree ...  estimated using bionj (4.7425s)
Optimisation settings: fast 
Scanning for modelomatic.ini file ... done
Working with genetic code: Universal
>>> Doing model analysis <<< 
RY Done  (8.05337s)
NT Done  (44.7082s)

Broken Prob(): -40.8111

modelomatic_OSX  /Volumes/Yango/Hayden_batORs/OR_alignments/Phyllos_OR52_Funct_tAlign.phylip bionj /Volumes/Yango/Hayden_batORs/Phyllos_OR52_Funct_tAlign_mOm_results.txt 0 fast

---------------------------------------------------------------------------------------------
  ModelOMatic (v1.01 (release)).
A program for choosing substitutions models for phylogenetic inference.
Written by Simon Whelan.
Contributions from James Allen, Ben Blackburne and David Talavera.
---------------------------------------------------------------------------------------------
Data: </Volumes/Yango/Hayden_batORs/OR_alignments/Phyllos_OR51_Funct_tAlign.phylip>: 152 sequences of length 750 (DataMatrix: 152 x 712)
Checking for sparse (>85% gaps) sequences
Starting with 152 ... after removal there are 152 sequences
Creating start tree ...  estimated using bionj (1.32059s)
Optimisation settings: fast 
Scanning for modelomatic.ini file ... done
Working with genetic code: Universal
>>> Doing model analysis <<< 
RY Done  (4.69988s)
NT Done  (13.8241s)
AA Done  (85.6409s).........
...
Broken Prob(): -2.64535e+41
I think the problem is in running the "fast" option with large datasets. When I use "trimfast" and "normal", they both work. 
modelomatic_OSX  /Volumes/Yango/Hayden_batORs/OR_alignments/Phyllos_OR51_Funct_tAlign.phylip bionj /Volumes/Yango/Hayden_batORs/model_o_matic/Phyllos_OR51_Funct_tAlign_mOm_results.txt 0 trimfast

---------------------------------------------------------------------------------------------
  ModelOMatic (v1.01 (release)).
A program for choosing substitutions models for phylogenetic inference.
Written by Simon Whelan.
Contributions from James Allen, Ben Blackburne and David Talavera.
---------------------------------------------------------------------------------------------
Data: </Volumes/Yango/Hayden_batORs/OR_alignments/Phyllos_OR51_Funct_tAlign.phylip>: 152 sequences of length 750 (DataMatrix: 152 x 712)
Checking for sparse (>85% gaps) sequences
Starting with 152 ... after removal there are 152 sequences
Creating start tree ...  estimated using bionj (1.58694s)
Optimisation settings: fast trim=10
TRIMMING: Tree contains 152 sequences (>TrimTree=10) ...
          Loose optimisation of start tree under JC ... done
          Obtaining greedy start tree with 10 sequences ... done
          Reinitialising objects for trimmed data ... done
          New files available: data = </Volumes/Yango/Hayden_batORs/model_o_matic/Phyllos_OR51_Funct_tAlign_mOm_results.txt.trim.data>; tree = </Volumes/Yango/Hayden_batORs/model_o_matic/Phyllos_OR51_Funct_tAlign_mOm_results.txt.trim.tree>
Scanning for modelomatic.ini file ... done
Working with genetic code: Universal
>>> Doing model analysis <<< 
RY Done  (1.08313s)
NT Done  (1.18703s)
AA Done  (6.64464s).........
Codon Done  (24.4936s)

Outputting results to </Volumes/Yango/Hayden_batORs/model_o_matic/Phyllos_OR51_Funct_tAlign_mOm_results.txt>
Successful exit

However, word of caution. The "trimfast" option gives a vastly different result:


Normal it is! With fewer parameters, its best just to wait it out.