Showing posts with label paml. Show all posts
Showing posts with label paml. Show all posts

Thursday, March 3, 2016

installing FastCodeML

Trying out a program called FastCodeML to try to get faster estimates of my PAML simulations. Joe Parker wrote a nice little summary on this. However, contrary to Joe's "nothing a little make make install can't handle", it actually...can't handle it.

Installation:
This is optimized for a Linux environment so if you have a Linux machine, you can just run the binary in the downloaded folder.

First visit: ftp://ftp.vital-it.ch/tools/FastCodeML/ and download FastCodeML-1.1.0.tar.gz. (Not completely obvious at first).

Navigate the directory and convince yourself that you can't run the binary.
105-238:FastCodeML-1.1.0 loloyohe$ file fast
fast: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.18, stripped
Nope, sorry..we have to build from source.

The installation guidelines are vague and basically useless.
Here are the install instructions:
Requirements to generate the executable:
* C++ compiler, e.g. GCC 4
* CMake 2.8.0 (including ccmake) or later recommended, although compilation possible without
* Boost::Spirit, see http://boost-spirit.com/home/
* Reasonably new BLAS implementation (e.g. OpenBLAS, Goto2, ACML, MKL); packages from various Linux distributions can be used, but this deteriorates performance; recommended: OpenBLAS (http://xianyi.github.io/OpenBLAS/) or Intel MKL
* Reasonably new LAPACK library (e.g. original LAPACK or ACML, MKL); packages from various Linux distributions can be used, but this deteriorates performance

How to generate the FastCodeML executable:
* Generate BLAS if necessary
* Generate LAPACK if necessary
* Generate NLopt library (http://ab-initio.mit.edu/wiki/index.php/NLopt)
* Edit CMakeLists.txt if necessary
* Set paths for libraries (change and execute SETPATHS)
* "ccmake ." and switch USE_MPI and USE_OPENMP on/off (other default settings should be ok)
* make will create an executable "fast"

Computer system:
* Linux preferred, but sources are portable to other platforms

And here are further "detailed instructions in the extra_install_doc folder for "Mac_Pro":
*) Create NLOpt in ~/lib
./configure --prefix=/home/mac/lib/nlopt
make
make install

*) Copy BLAS and LAPACK to ~/lib
cp libblas.a ~/lib 
cp liblapack.a ~/lib

*) Setting environment variables
export BLAS_LIB_DIR="/home/mac/lib" #we want to flexible here, hence we do not specify /usr/lib
export LAPACK_LIB_DIR="/home/mac/lib" #we want to flexible here, hence we do not specify /usr/lib
export NLOPT_LIB_DIR="/home/mac/lib/nlopt/lib"
export NLOPT_INCLUDE_DIR="/home/mac/lib/nlopt/include"
export MATH_LIB_NAMES="blas;lapack;lapack;blas;gfortranbegin;gfortran"
#export MPI_INCLUDE_PATH="/usr/include" #might not be necessary if CXX set correctly
#export MPI_LIBRARY="/usr/lib" #might not be necessary if CXX set correctly
export CXX="/usr/bin/mpicxx.mpich2" #remember to run as mpirun.mpich2 -np 2 ./fast

This is bringing back a past "lapack" nightmare I had several months ago. Looks like its going to be great fun.

We have a "homebrew" environment. While mostly good, it has confused installers about where our libraries are and which compiler to use. That, and along with Mavericks OSX being horribly configured make this extra challenging. I first try to brew install everything they ask for. Basically, if you have a homebrew setup, don't do anything the instructions tell you.

Really could only get boost installed
105-238:FastCodeML-1.1.0 loloyohe$ brew install boost
Lapack and BLAS fail miserably. Luckily, these seem to be optional so when we go to ./configure, we can turn these things off. Instead of a normal make/install setup, this program is set up for ccmake (note: NOT cmake). Fun! This means when you run it, it is looking for "CMakeLists.txt". This is the file you edit.

After much hair-pulling, here is what my CMakeLists.txt. (Everything else stayed the same).
# Get the configuration switches
OPTION(USE_LAPACK         "Use BLAS/LAPACK" OFF)
OPTION(USE_MKL_VML         "Use Intel MKL vectorized routines" OFF)
OPTION(USE_OPENMP         "Compile with OpenMP support" OFF)
OPTION(USE_MPI             "Use MPI for high level parallelization" OFF)
if(NOT WIN32)
OPTION(BUILD_NOT_SHARED   "Build FastCodeML not shared" OFF)
endif(NOT WIN32)
OPTION(BUILD_SEARCH_MPI   "Search for MPI installation?" OFF)
OPTION(USE_ORIGINAL_PROPORTIONS "Use the original CodeML proportion definition" OFF)
SET(USE_LIKELIHOOD_METHOD "Original" CACHE STRING "Select the type of likelihood computation method: Original, NonRecursive, FatVector, DAG")
SET_PROPERTY(CACHE USE_LIKELIHOOD_METHOD PROPERTY STRINGS Original NonRecursive FatVector DAG)
OPTION(USE_IDENTITY_MATRIX "Force identity matrix when time is zero" OFF)
OPTION(USE_CPV_SCALING "Scale conditional probability vectors to avoid under/overflow" OFF)

Now "ccmake" was a new experience for me. It wasn't working when I just typed "ccmake ." as instructed. I don't know why. But anyways, if i did:
105-238:FastCodeML-1.1.0 loloyohe$ ccmake /Applications/FastCodeML-1.1.0/
then a new interface shows up in the terminal, basically showing what I had switched on and off. It was not intuitive to me what was happening, but I just kept pressing "enter" "n" and "g" until finally I got out of the screen. Makefile, can I haz? I can haz!!!

105-238:FastCodeML-1.1.0 loloyohe$ make
Scanning dependencies of target fast
[  4%] Building CXX object CMakeFiles/fast.dir/fast.cpp.o
[  8%] Building CXX object CMakeFiles/fast.dir/CmdLine.cpp.o
[ 12%] Building CXX object CMakeFiles/fast.dir/Genes.cpp.o
[ 16%] Building CXX object CMakeFiles/fast.dir/Phylip.cpp.o
[ 20%] Building CXX object CMakeFiles/fast.dir/PhyloTree.cpp.o
[ 24%] Building CXX object CMakeFiles/fast.dir/Newick.cpp.o
[ 28%] Building CXX object CMakeFiles/fast.dir/TreeNode.cpp.o
[ 32%] Building CXX object CMakeFiles/fast.dir/BayesTest.cpp.o
[ 36%] Building CXX object CMakeFiles/fast.dir/FillMatrix.cpp.o
[ 40%] Building CXX object CMakeFiles/fast.dir/Forest.cpp.o
[ 44%] Building CXX object CMakeFiles/fast.dir/TransitionMatrix.cpp.o
[ 48%] Building CXX object CMakeFiles/fast.dir/BranchSiteModel.cpp.o
/Applications/FastCodeML-1.1.0/BranchSiteModel.cpp:381:27: warning: comparison of unsigned expression < 0 is always false
      [-Wtautological-compare]
        else if(aValidLen < 0)
                ~~~~~~~~~ ^ ~
1 warning generated.
[ 52%] Building CXX object CMakeFiles/fast.dir/ProbabilityMatrixSet.cpp.o
[ 56%] Building CXX object CMakeFiles/fast.dir/FatVectorTransform.cpp.o
[ 60%] Building CXX object CMakeFiles/fast.dir/CodonFrequencies.cpp.o
[ 64%] Building CXX object CMakeFiles/fast.dir/AlignedAllocator.cpp.o
/Applications/FastCodeML-1.1.0/AlignedAllocator.cpp:22:10: fatal error: 'malloc.h' file not found
#include <malloc.h>
         ^
1 error generated.
make[2]: *** [CMakeFiles/fast.dir/AlignedAllocator.cpp.o] Error 1
make[1]: *** [CMakeFiles/fast.dir/all] Error 2

make: *** [all] Error 2
We are getting closer. Now its time to get hacky. 
Basically, comment out the malloc.h in anything that uses it.

Open up "AlignedAllocator.cpp".
Change: 
#include <malloc.h>
To:
//#include <malloc.h>

Try again!
105-238:FastCodeML-1.1.0 loloyohe$ make
Scanning dependencies of target fast
[  4%] Building CXX object CMakeFiles/fast.dir/AlignedAllocator.cpp.o
[  8%] Building CXX object CMakeFiles/fast.dir/HighLevelCoordinator.cpp.o
[ 12%] Building CXX object CMakeFiles/fast.dir/CodeMLoptimizer.cpp.o
[ 16%] Building CXX object CMakeFiles/fast.dir/ForestExport.cpp.o
[ 20%] Building CXX object CMakeFiles/fast.dir/ParseParameters.cpp.o
[ 24%] Building CXX object CMakeFiles/fast.dir/VerbosityLevels.cpp.o
[ 28%] Building CXX object CMakeFiles/fast.dir/DAGScheduler.cpp.o
[ 32%] Building CXX object CMakeFiles/fast.dir/TreeAndSetsDependencies.cpp.o
[ 36%] Building CXX object CMakeFiles/fast.dir/WriteResults.cpp.o
[ 40%] Linking CXX executable fast
[100%] Built target fast

There, that's better.
105-238:FastCodeML-1.1.0 loloyohe$ file fast
fast: Mach-O 64-bit executable x86_64

More on execution later.

Thursday, July 23, 2015

PAML lab

Exercise 1:The objective of this activity is to use CODEML to evaluate the likelihood of the GstD1 sequences for a variety of ω values. Plot log-likelihood scores against the values of ω and determine the maximum likelihood estimate of ω. Check your finding by running CODEML’s hill-climbing algorithm.
#run this command to execute:

codeml ex1_codeml.ctl

This is what the control file looks like: 
      seqfile = seqfile.txt         * sequence data filename
      outfile = results_0.010.txt   * main result file name

        noisy = 9      * 0,1,2,3,9: how much rubbish on the screen
      verbose = 1      * 1:detailed output
      runmode = -2     * -2:pairwise

      seqtype = 1      * 1:codons
    CodonFreq = 3      * 0:equal, 1:F1X4, 2:F3X4, 3:F61
        model = 0      *
      NSsites = 0      *
        icode = 0      * 0:universal code

    fix_kappa = 0      * 1:kappa fixed, 0:kappa to be estimated
        kappa = 2      * initial or fixed kappa

    fix_omega = 1      * 1:omega fixed, 0:omega to be estimated
       * omega = 0.001  * 1st fixed omega value [change this]

       * EXCERCISE 1
       *alternate fixed omega values
       *omega = 0.005  * 2nd fixed value
       *omega = 0.01   * 3rd fixed value
       *omega = 0.05   * 4th fixed value
       *omega = 0.10   * 5th fixed value
       *omega = 0.20   * 6th fixed value
       *omega = 0.40   * 7th fixed value
       *omega = 0.80   * 8th fixed value
       *omega = 1.60   * 9th fixed value
       omega = 2.00   * 10th fixed value

The MLE of ω is 0.04.

Now we are going to have PAML estimate ω. In the control file, set fix_omega=0.
------------------
2 (Sim) ... 1 (Mel)
lnL = -756.568200
  0.12527  2.53119  0.06698


t= 0.1253  S=    45.2  N=   554.8  dN/dS= 0.0670  dN= 0.0204  dS= 0.3041
---------------------

Exercise 2: In this exercise you will investigate the sensitivity of your estimate of ω to the transition/transversion ratio (κ), and to the assumed model for codon frequencies (π’s). After you collect the required data you will determine which assumptions yield the largest and smallest values of S, and what effect it as on the value of ω.

Here is the control file:
      seqfile = seqfile.txt   * sequence data filename
      outfile = results.txt   * main result file name

        noisy = 9      * 0,1,2,3,9: how much rubbish on the screen
      verbose = 1      * 1:detailed output
      runmode = -2     * -2:pairwise

      seqtype = 1      * 1:codons
    CodonFreq = 0      * 0:equal, 1:F1X4, 2:F3X4, 3:F61          [change this]
        model = 0      *
      NSsites = 0      *
        icode = 0      * 0:universal code

    fix_kappa = 1      * 1:kappa fixed, 0:kappa to be estimated  [change this]
        kappa = 1      * initial or fixed kappa

    fix_omega = 0      * 1:omega fixed, 0:omega to be estimated
        omega = 0.5    * initial omega value


                       * EXCERCISE 2
                       * Codon bias = none; Ts/Tv bias = none
                       * Codon bias = none; Ts/Tv bias = Yes (ML)

                       * Codon bias = yes (F3x4); Ts/Tv bias = none
                       * Codon bias = yes (F3x4); Ts/Tv bias = Yes (ML)

                       * Codon bias = yes (F61); Ts/Tv bias = none

                       * Codon bias = yes (F61); Ts/Tv bias = Yes (ML)

Test#1  Codon bias = none; Ts/Tv bias = none
CodonFreq = 0      * 0:equal, 
fix_kappa = 1      * 1:kappa fixed,

2 (Sim) ... 1 (Mel)
lnL = -927.178200
  0.10683  0.27421


t= 0.1068  S=   152.9  N=   447.1  dN/dS= 0.2742  dN= 0.0213  dS= 0.0776
Test#2 Codon bias = none; Ts/Tv bias = Yes (ML)
*Highlighted is the ML estimate of of sequences kappa (ts/tv) 
CodonFreq = 0      * 0:equal, 
fix_kappa = 0      * 1:kappa fixed0:kappa to be estimated

2 (Sim) ... 1 (Mel)
lnL = -926.280300
  0.10533  1.88371  0.32004 


t= 0.1053  S=   165.8  N=   434.2  dN/dS= 0.3200  dN= 0.0221  dS= 0.0691

Test#3 Codon bias = yes (F3x4); Ts/Tv bias = none
CodonFreq = 2      * 2 F3X4, 
fix_kappa = 1      * 1:kappa fixed,

2 (Sim) ... 1 (Mel)
lnL = -844.507294
  0.10684  0.11807


t= 0.1068  S=    70.6  N=   529.4  dN/dS= 0.1181  dN= 0.0189  dS= 0.1605

Test#4 Codon bias = yes (F3x4); Ts/Tv bias = Yes (ML)
CodonFreq = 2      * 2 F3X4, 
fix_kappa = 0      * 0:kappa to be estimated

2 (Sim) ... 1 (Mel)
lnL = -842.214727
  0.10686  2.71034  0.12661


t= 0.1069  S=    73.4  N=   526.6  dN/dS= 0.1266  dN= 0.0193  dS= 0.1526

Test #5 Codon bias = yes (F61); Ts/Tv bias = none
CodonFreq = 3      * 3 F61

fix_kappa = 1      * 1:kappa fixed,

2 (Sim) ... 1 (Mel)
lnL = -758.551863
  0.12089  0.06278

t= 0.1209  S=    40.5  N=   559.5  dN/dS= 0.0628  dN= 0.0201  dS= 0.3198

Test #6 Codon bias = yes (F61); Ts/Tv bias = Yes (ML)
CodonFreq = 3      * 3 F61


fix_kappa = 0      * 0:kappa to be estimated
2 (Sim) ... 1 (Mel)
lnL = -756.568200
  0.12527  2.53121  0.06698


t= 0.1253  S=    45.2  N=   554.8  dN/dS= 0.0670  dN= 0.0204  dS= 0.3040

Summarized in a table


Assumptions K S N dS dN omega lnL
Fequal + k=1 1 152.9 447.1 0.021 0.078 0.27 -927.1782
Fequal + k=estimated 1.88 165.8 434.2 0.07 0.022 0.32 -926.2803
F3X4 + k=1 1 70.6 529.4 0.16 0.019 0.118 -844.507294
F3X4 + k=estimated 2.71 73.4 526.6 0.15 0.02 0.127 -842.214727
F61 + k=1 1 40.5 559.5 0.32 0.02 0.063 -758.551863
F61 + k=estimated 2.53 45.2 554.8 0.3 0.02 0.067 -756.5682

The assumption that yields the smallest value of S is F61 + k=1. It changes the value of omega by an order of magnitude.

Exercise 3: The objective of this exercise is to use three LRTs to evaluate the following hypotheses: (1) the mutation rate of Ldh-C has increased relative to Ldh-A, (2) a burst of positive selection for functional divergence occurred following the duplication event that gave rise to Ldh-C, and (3) there was a long term shift in selective constraints following the duplication event that gave rise to Ldh-C.

Here is the control file:
      seqfile = seqfile.txt   * sequence data filename
     treefile = treeH0.txt      * tree structure file name
      outfile = results.txt   * main result file name

        noisy = 9      * 0,1,2,3,9: how much rubbish on the screen
      verbose = 1      * 1:detailed output
      runmode = 0      * 0:user defined tree

      seqtype = 1      * 1:codons
    CodonFreq = 2      * 0:equal, 1:F1X4, 2:F3X4, 3:F61

        model = 0      * 0:one omega ratio for all branches
                       * 1:separate omega for each branch
                       * 2:user specified dN/dS ratios for branches

      NSsites = 0      *

        icode = 0      * 0:universal code

    fix_kappa = 0      * 1:kappa fixed, 0:kappa to be estimated
        kappa = 2      * initial or fixed kappa

    fix_omega = 0      * 1:omega fixed, 0:omega to be estimated
        omega = 0.2    * initial omega

                       * comments:
                       * H0 in Table 3: model = 0
                       * H1 in Table 3: model = 2
                       * H2 in Table 3: model = 2

                       * H3 in Table 3: model = 2

H0: homogeneous selection pressure over the tree
(X02152Hom,U07178Sus,(M22585rab,((NM017025Rat,U13687Mus),(((AF070995C,(X04752Mus,U07177Rat)),(U95378Sus,U13680Hom)),(X53828OG1,U28410OG2)))));


Here is the output (lnL highlighted in yellow):
TREE #  1:  (1, 2, (5, ((3, 4), (((6, (8, 9)), (7, 10)), (11, 12)))));   MP score: -1
lnL(ntime: 21  np: 23):  -6018.633010     +0.000000
  13..1    13..2    13..14   14..5    14..15   15..16   16..3    16..4    15..17   17..18   18..19   19..6    19..20   20..8    20..9    18..21   21..7    21..10   17..22   22..11   22..12 
 0.165986 0.145030 0.060255 0.252090 0.030239 0.230093 0.133935 0.101768 0.125411 0.692354 0.249468 0.205980 0.294515 0.131299 0.213038 0.174959 0.155890 0.172753 0.321380 0.290395 0.356759 2.401783 0.136656

Note: Branch length is defined as number of nucleotide substitutions per codon (not per neucleotide site).

tree length =   4.50360

(1: 0.165986, 2: 0.145030, (5: 0.252090, ((3: 0.133935, 4: 0.101768): 0.230093, (((6: 0.205980, (8: 0.131299, 9: 0.213038): 0.294515): 0.249468, (7: 0.155890, 10: 0.172753): 0.174959): 0.692354, (11: 0.290395, 12: 0.356759): 0.321380): 0.125411): 0.030239): 0.060255);

(X02152Hom: 0.165986, U07178Sus: 0.145030, (M22585rab: 0.252090, ((NM017025Rat: 0.133935, U13687Mus: 0.101768): 0.230093, (((AF070995C: 0.205980, (X04752Mus: 0.131299, U07177Rat: 0.213038): 0.294515): 0.249468, (U95378Sus: 0.155890, U13680Hom: 0.172753): 0.174959): 0.692354, (X53828OG1: 0.290395, U28410OG2: 0.356759): 0.321380): 0.125411): 0.030239): 0.060255);

Detailed output identifying parameters

kappa (ts/tv) =  2.40178

omega (dN/dS) =  0.13666

dN & dS for each branch

 branch           t        N        S    dN/dS       dN       dS   N*dN   S*dS

  13..1       0.166    729.3    269.7   0.1367   0.0205   0.1497   14.9   40.4
  13..2       0.145    729.3    269.7   0.1367   0.0179   0.1308   13.0   35.3
  13..14      0.060    729.3    269.7   0.1367   0.0074   0.0543    5.4   14.7
  14..5       0.252    729.3    269.7   0.1367   0.0311   0.2273   22.7   61.3
  14..15      0.030    729.3    269.7   0.1367   0.0037   0.0273    2.7    7.4
  15..16      0.230    729.3    269.7   0.1367   0.0283   0.2075   20.7   55.9
  16..3       0.134    729.3    269.7   0.1367   0.0165   0.1208   12.0   32.6
  16..4       0.102    729.3    269.7   0.1367   0.0125   0.0918    9.1   24.7
  15..17      0.125    729.3    269.7   0.1367   0.0155   0.1131   11.3   30.5
  17..18      0.692    729.3    269.7   0.1367   0.0853   0.6242   62.2  168.3
  18..19      0.249    729.3    269.7   0.1367   0.0307   0.2249   22.4   60.7
  19..6       0.206    729.3    269.7   0.1367   0.0254   0.1857   18.5   50.1
  19..20      0.295    729.3    269.7   0.1367   0.0363   0.2655   26.5   71.6
  20..8       0.131    729.3    269.7   0.1367   0.0162   0.1184   11.8   31.9
  20..9       0.213    729.3    269.7   0.1367   0.0262   0.1921   19.1   51.8
  18..21      0.175    729.3    269.7   0.1367   0.0216   0.1577   15.7   42.5
  21..7       0.156    729.3    269.7   0.1367   0.0192   0.1406   14.0   37.9
  21..10      0.173    729.3    269.7   0.1367   0.0213   0.1558   15.5   42.0
  17..22      0.321    729.3    269.7   0.1367   0.0396   0.2898   28.9   78.1
  22..11      0.290    729.3    269.7   0.1367   0.0358   0.2618   26.1   70.6
  22..12      0.357    729.3    269.7   0.1367   0.0440   0.3217   32.1   86.7

tree length for dN:      0.55489
tree length for dS:      4.06048

H1: episodic change in selection pressure in Ldh-C (only in the branch that immediately follows the gene duplication event).
Here is the tree file for this:
(X02152Hom,U07178Sus,(M22585rab,((NM017025Rat,U13687Mus),(((AF070995C,(X04752Mus,U07177Rat)),(U95378Sus,U13680Hom))#1,(X53828OG1,U28410OG2)))));

Output:
TREE #  1:  (1, 2, (5, ((3, 4), (((6, (8, 9)), (7, 10)), (11, 12)))));   MP score: -1
lnL(ntime: 21  np: 24):  -6017.572460     +0.000000
  13..1    13..2    13..14   14..5    14..15   15..16   16..3    16..4    15..17   17..18   18..19   19..6    19..20   20..8    20..9    18..21   21..7    21..10   17..22   22..11   22..12 
 0.166139 0.145654 0.060220 0.253769 0.027868 0.234240 0.134451 0.101620 0.136088 0.615641 0.250544 0.206999 0.295643 0.131224 0.213679 0.176223 0.155609 0.173597 0.319034 0.291976 0.357768 2.397123 0.131877 0.192033

Note: Branch length is defined as number of nucleotide substitutions per codon (not per nucleotide site).

tree length =   4.44798

(1: 0.166139, 2: 0.145654, (5: 0.253769, ((3: 0.134451, 4: 0.101620): 0.234240, (((6: 0.206999, (8: 0.131224, 9: 0.213679): 0.295643): 0.250544, (7: 0.155609, 10: 0.173597): 0.176223): 0.615641, (11: 0.291976, 12: 0.357768): 0.319034): 0.136088): 0.027868): 0.060220);

(X02152Hom: 0.166139, U07178Sus: 0.145654, (M22585rab: 0.253769, ((NM017025Rat: 0.134451, U13687Mus: 0.101620): 0.234240, (((AF070995C: 0.206999, (X04752Mus: 0.131224, U07177Rat: 0.213679): 0.295643): 0.250544, (U95378Sus: 0.155609, U13680Hom: 0.173597): 0.176223): 0.615641, (X53828OG1: 0.291976, U28410OG2: 0.357768): 0.319034): 0.136088): 0.027868): 0.060220);

Detailed output identifying parameters

kappa (ts/tv) =  2.39712

dN & dS for each branch

 branch           t        N        S    dN/dS       dN       dS   N*dN   S*dS

  13..1       0.166    729.4    269.6   0.1319   0.0199   0.1512   14.5   40.8
  13..2       0.146    729.4    269.6   0.1319   0.0175   0.1326   12.8   35.7
  13..14      0.060    729.4    269.6   0.1319   0.0072   0.0548    5.3   14.8
  14..5       0.254    729.4    269.6   0.1319   0.0305   0.2310   22.2   62.3
  14..15      0.028    729.4    269.6   0.1319   0.0033   0.0254    2.4    6.8
  15..16      0.234    729.4    269.6   0.1319   0.0281   0.2132   20.5   57.5
  16..3       0.134    729.4    269.6   0.1319   0.0161   0.1224   11.8   33.0
  16..4       0.102    729.4    269.6   0.1319   0.0122   0.0925    8.9   24.9
  15..17      0.136    729.4    269.6   0.1319   0.0163   0.1239   11.9   33.4
  17..18      0.616    729.4    269.6   0.1920   0.0961   0.5004   70.1  134.9
  18..19      0.251    729.4    269.6   0.1319   0.0301   0.2281   21.9   61.5
  19..6       0.207    729.4    269.6   0.1319   0.0249   0.1884   18.1   50.8
  19..20      0.296    729.4    269.6   0.1319   0.0355   0.2691   25.9   72.6
  20..8       0.131    729.4    269.6   0.1319   0.0158   0.1195   11.5   32.2
  20..9       0.214    729.4    269.6   0.1319   0.0257   0.1945   18.7   52.4
  18..21      0.176    729.4    269.6   0.1319   0.0212   0.1604   15.4   43.3
  21..7       0.156    729.4    269.6   0.1319   0.0187   0.1417   13.6   38.2
  21..10      0.174    729.4    269.6   0.1319   0.0208   0.1580   15.2   42.6
  17..22      0.319    729.4    269.6   0.1319   0.0383   0.2904   27.9   78.3
  22..11      0.292    729.4    269.6   0.1319   0.0351   0.2658   25.6   71.7
  22..12      0.358    729.4    269.6   0.1319   0.0430   0.3257   31.3   87.8

tree length for dN:      0.55619
tree length for dS:      3.98922

dS tree:
(X02152Hom: 0.151246, U07178Sus: 0.132597, (M22585rab: 0.231020, ((NM017025Rat: 0.122398, U13687Mus: 0.092510): 0.213241, (((AF070995C: 0.188443, (X04752Mus: 0.119460, U07177Rat: 0.194523): 0.269141): 0.228084, (U95378Sus: 0.141659, U13680Hom: 0.158035): 0.160426): 0.500425, (X53828OG1: 0.265802, U28410OG2: 0.325696): 0.290434): 0.123889): 0.025370): 0.054822);
dN tree:
(X02152Hom: 0.019946, U07178Sus: 0.017487, (M22585rab: 0.030466, ((NM017025Rat: 0.016142, U13687Mus: 0.012200): 0.028122, (((AF070995C: 0.024851, (X04752Mus: 0.015754, U07177Rat: 0.025653): 0.035494): 0.030079, (U95378Sus: 0.018682, U13680Hom: 0.020841): 0.021157): 0.096098, (X53828OG1: 0.035053, U28410OG2: 0.042952): 0.038302): 0.016338): 0.003346): 0.007230);

w ratios as labels for TreeView:
(X02152Hom #0.1319 : 0.019946, U07178Sus #0.1319 : 0.017487, (M22585rab #0.1319 : 0.030466, ((NM017025Rat #0.1319 : 0.016142, U13687Mus #0.1319 : 0.012200) #0.1319 : 0.028122, (((AF070995C #0.1319 : 0.024851, (X04752Mus #0.1319 : 0.015754, U07177Rat #0.1319 : 0.025653) #0.1319 : 0.035494) #0.1319 : 0.030079, (U95378Sus #0.1319 : 0.018682, U13680Hom #0.1319 : 0.020841) #0.1319 : 0.021157)#1 #0.1920 : 0.096098, (X53828OG1 #0.1319 : 0.035053, U28410OG2 #0.1319 : 0.042952) #0.1319 : 0.038302) #0.1319 : 0.016338) #0.1319 : 0.003346) #0.1319 : 0.007230);

H2: Long term shift in selection pressure in Ldh-C only; Ldh-C has a permanent change in selection pressure (as compared to its ancestors) whereas Ldh-A remains subject to the ancestral level of selection pressure.
(X02152Hom,U07178Sus,(M22585rab,((NM017025Rat,U13687Mus),(((AF070995C #1,(X04752Mus #1,U07177Rat #1)#1)#1,(U95378Sus #1,U13680Hom #1)#1)#1,(X53828OG1,U28410OG2)))));

Output:
TREE #  1:  (1, 2, (5, ((3, 4), (((6, (8, 9)), (7, 10)), (11, 12)))));   MP score: -1
lnL(ntime: 21  np: 24):  -5985.635237     +0.000000
  13..1    13..2    13..14   14..5    14..15   15..16   16..3    16..4    15..17   17..18   18..19   19..6    19..20   20..8    20..9    18..21   21..7    21..10   17..22   22..11   22..12 
 0.171224 0.151409 0.058980 0.269203 0.017888 0.262039 0.138108 0.103579 0.168914 0.560187 0.231662 0.203731 0.272797 0.130747 0.204423 0.162430 0.151155 0.167095 0.366793 0.312139 0.396584 2.403414 0.075516 0.239425

Note: Branch length is defined as number of nucleotide substitutions per codon (not per neucleotide site).

tree length =   4.50108

(1: 0.171224, 2: 0.151409, (5: 0.269203, ((3: 0.138108, 4: 0.103579): 0.262039, (((6: 0.203731, (8: 0.130747, 9: 0.204423): 0.272797): 0.231662, (7: 0.151155, 10: 0.167095): 0.162430): 0.560187, (11: 0.312139, 12: 0.396584): 0.366793): 0.168914): 0.017888): 0.058980);

(X02152Hom: 0.171224, U07178Sus: 0.151409, (M22585rab: 0.269203, ((NM017025Rat: 0.138108, U13687Mus: 0.103579): 0.262039, (((AF070995C: 0.203731, (X04752Mus: 0.130747, U07177Rat: 0.204423): 0.272797): 0.231662, (U95378Sus: 0.151155, U13680Hom: 0.167095): 0.162430): 0.560187, (X53828OG1: 0.312139, U28410OG2: 0.396584): 0.366793): 0.168914): 0.017888): 0.058980);

Detailed output identifying parameters

kappa (ts/tv) =  2.40341

dN & dS for each branch

 branch           t        N        S    dN/dS       dN       dS   N*dN   S*dS

  13..1       0.171    729.3    269.7   0.0755   0.0133   0.1756    9.7   47.3
  13..2       0.151    729.3    269.7   0.0755   0.0117   0.1552    8.5   41.9
  13..14      0.059    729.3    269.7   0.0755   0.0046   0.0605    3.3   16.3
  14..5       0.269    729.3    269.7   0.0755   0.0208   0.2760   15.2   74.4
  14..15      0.018    729.3    269.7   0.0755   0.0014   0.0183    1.0    4.9
  15..16      0.262    729.3    269.7   0.0755   0.0203   0.2687   14.8   72.5
  16..3       0.138    729.3    269.7   0.0755   0.0107   0.1416    7.8   38.2
  16..4       0.104    729.3    269.7   0.0755   0.0080   0.1062    5.8   28.6
  15..17      0.169    729.3    269.7   0.0755   0.0131   0.1732    9.5   46.7
  17..18      0.560    729.3    269.7   0.2394   0.1005   0.4198   73.3  113.2
  18..19      0.232    729.3    269.7   0.2394   0.0416   0.1736   30.3   46.8
  19..6       0.204    729.3    269.7   0.2394   0.0366   0.1527   26.7   41.2
  19..20      0.273    729.3    269.7   0.2394   0.0490   0.2045   35.7   55.1
  20..8       0.131    729.3    269.7   0.2394   0.0235   0.0980   17.1   26.4
  20..9       0.204    729.3    269.7   0.2394   0.0367   0.1532   26.8   41.3
  18..21      0.162    729.3    269.7   0.2394   0.0291   0.1217   21.3   32.8
  21..7       0.151    729.3    269.7   0.2394   0.0271   0.1133   19.8   30.6
  21..10      0.167    729.3    269.7   0.2394   0.0300   0.1252   21.9   33.8
  17..22      0.367    729.3    269.7   0.0755   0.0284   0.3761   20.7  101.4
  22..11      0.312    729.3    269.7   0.0755   0.0242   0.3200   17.6   86.3
  22..12      0.397    729.3    269.7   0.0755   0.0307   0.4066   22.4  109.7

tree length for dN:      0.56113
tree length for dS:      4.04015

dS tree:
(X02152Hom: 0.175561, U07178Sus: 0.155244, (M22585rab: 0.276022, ((NM017025Rat: 0.141606, U13687Mus: 0.106203): 0.268676, (((AF070995C: 0.152691, (X04752Mus: 0.097991, U07177Rat: 0.153209): 0.204454): 0.173624, (U95378Sus: 0.113287, U13680Hom: 0.125233): 0.121737): 0.419845, (X53828OG1: 0.320046, U28410OG2: 0.406630): 0.376085): 0.173193): 0.018341): 0.060474);
dN tree:
(X02152Hom: 0.013258, U07178Sus: 0.011723, (M22585rab: 0.020844, ((NM017025Rat: 0.010693, U13687Mus: 0.008020): 0.020289, (((AF070995C: 0.036558, (X04752Mus: 0.023462, U07177Rat: 0.036682): 0.048951): 0.041570, (U95378Sus: 0.027124, U13680Hom: 0.029984): 0.029147): 0.100521, (X53828OG1: 0.024169, U28410OG2: 0.030707): 0.028400): 0.013079): 0.001385): 0.004567);

w ratios as labels for TreeView:
(X02152Hom #0.0755 : 0.013258, U07178Sus #0.0755 : 0.011723, (M22585rab #0.0755 : 0.020844, ((NM017025Rat #0.0755 : 0.010693, U13687Mus #0.0755 : 0.008020) #0.0755 : 0.020289, (((AF070995C#1 #0.2394 : 0.036558, (X04752Mus#1 #0.2394 : 0.023462, U07177Rat#1 #0.2394 : 0.036682)#1 #0.2394 : 0.048951)#1 #0.2394 : 0.041570, (U95378Sus#1 #0.2394 : 0.027124, U13680Hom#1 #0.2394 : 0.029984)#1 #0.2394 : 0.029147)#1 #0.2394 : 0.100521, (X53828OG1 #0.0755 : 0.024169, U28410OG2 #0.0755 : 0.030707) #0.0755 : 0.028400) #0.0755 : 0.013079) #0.0755 : 0.001385) #0.0755 : 0.004567);


Time used:  0:56
H3: Long term shift in selection on both Ldh-C and Ldh-A; those lineages are subject to selection pressures different from each other and from the ancestor.
Tree file:
(X02152Hom,U07178Sus,(M22585rab,((NM017025Rat,U13687Mus),(((AF070995C #1,(X04752Mus #1,U07177Rat #1)#1)#1,(U95378Sus #1,U13680Hom #1)#1)#1,(X53828OG1 #2,U28410OG2 #2)#2))));

Output:
TREE #  1:  (1, 2, (5, ((3, 4), (((6, (8, 9)), (7, 10)), (11, 12)))));   MP score: -1
lnL(ntime: 21  np: 25):  -5984.111015     +0.000000
  13..1    13..2    13..14   14..5    14..15   15..16   16..3    16..4    15..17   17..18   18..19   19..6    19..20   20..8    20..9    18..21   21..7    21..10   17..22   22..11   22..12 
 0.173367 0.152089 0.058499 0.273176 0.018609 0.265853 0.138699 0.104431 0.178874 0.556050 0.231889 0.203607 0.272766 0.130870 0.204220 0.161922 0.150629 0.167529 0.341576 0.302793 0.379370 2.399088 0.064218 0.240341 0.094898

Note: Branch length is defined as number of nucleotide substitutions per codon (not per neucleotide site).

tree length =   4.46682

(1: 0.173367, 2: 0.152089, (5: 0.273176, ((3: 0.138699, 4: 0.104431): 0.265853, (((6: 0.203607, (8: 0.130870, 9: 0.204220): 0.272766): 0.231889, (7: 0.150629, 10: 0.167529): 0.161922): 0.556050, (11: 0.302793, 12: 0.379370): 0.341576): 0.178874): 0.018609): 0.058499);

(X02152Hom: 0.173367, U07178Sus: 0.152089, (M22585rab: 0.273176, ((NM017025Rat: 0.138699, U13687Mus: 0.104431): 0.265853, (((AF070995C: 0.203607, (X04752Mus: 0.130870, U07177Rat: 0.204220): 0.272766): 0.231889, (U95378Sus: 0.150629, U13680Hom: 0.167529): 0.161922): 0.556050, (X53828OG1: 0.302793, U28410OG2: 0.379370): 0.341576): 0.178874): 0.018609): 0.058499);

Detailed output identifying parameters

kappa (ts/tv) =  2.39909

dN & dS for each branch

 branch           t        N        S    dN/dS       dN       dS   N*dN   S*dS

  13..1       0.173    729.4    269.6   0.0642   0.0117   0.1824    8.5   49.2
  13..2       0.152    729.4    269.6   0.0642   0.0103   0.1600    7.5   43.1
  13..14      0.058    729.4    269.6   0.0642   0.0040   0.0616    2.9   16.6
  14..5       0.273    729.4    269.6   0.0642   0.0185   0.2874   13.5   77.5
  14..15      0.019    729.4    269.6   0.0642   0.0013   0.0196    0.9    5.3
  15..16      0.266    729.4    269.6   0.0642   0.0180   0.2797   13.1   75.4
  16..3       0.139    729.4    269.6   0.0642   0.0094   0.1459    6.8   39.4
  16..4       0.104    729.4    269.6   0.0642   0.0071   0.1099    5.1   29.6
  15..17      0.179    729.4    269.6   0.0642   0.0121   0.1882    8.8   50.7
  17..18      0.556    729.4    269.6   0.2403   0.1000   0.4162   73.0  112.2
  18..19      0.232    729.4    269.6   0.2403   0.0417   0.1736   30.4   46.8
  19..6       0.204    729.4    269.6   0.2403   0.0366   0.1524   26.7   41.1
  19..20      0.273    729.4    269.6   0.2403   0.0491   0.2041   35.8   55.0
  20..8       0.131    729.4    269.6   0.2403   0.0235   0.0979   17.2   26.4
  20..9       0.204    729.4    269.6   0.2403   0.0367   0.1528   26.8   41.2
  18..21      0.162    729.4    269.6   0.2403   0.0291   0.1212   21.2   32.7
  21..7       0.151    729.4    269.6   0.2403   0.0271   0.1127   19.8   30.4
  21..10      0.168    729.4    269.6   0.2403   0.0301   0.1254   22.0   33.8
  17..22      0.342    729.4    269.6   0.0949   0.0319   0.3357   23.2   90.5
  22..11      0.303    729.4    269.6   0.0949   0.0282   0.2976   20.6   80.2
  22..12      0.379    729.4    269.6   0.0949   0.0354   0.3728   25.8  100.5

tree length for dN:      0.56167
tree length for dS:      3.99725

dS tree:
(X02152Hom: 0.182422, U07178Sus: 0.160033, (M22585rab: 0.287445, ((NM017025Rat: 0.145944, U13687Mus: 0.109885): 0.279740, (((AF070995C: 0.152387, (X04752Mus: 0.097947, U07177Rat: 0.152845): 0.204148): 0.173554, (U95378Sus: 0.112736, U13680Hom: 0.125384): 0.121188): 0.416167, (X53828OG1: 0.297568, U28410OG2: 0.372824): 0.335682): 0.188217): 0.019581): 0.061554);
dN tree:
(X02152Hom: 0.011715, U07178Sus: 0.010277, (M22585rab: 0.018459, ((NM017025Rat: 0.009372, U13687Mus: 0.007057): 0.017964, (((AF070995C: 0.036625, (X04752Mus: 0.023541, U07177Rat: 0.036735): 0.049065): 0.041712, (U95378Sus: 0.027095, U13680Hom: 0.030135): 0.029127): 0.100022, (X53828OG1: 0.028239, U28410OG2: 0.035380): 0.031856): 0.012087): 0.001257): 0.003953);

w ratios as labels for TreeView:
(X02152Hom #0.0642 : 0.011715, U07178Sus #0.0642 : 0.010277, (M22585rab #0.0642 : 0.018459, ((NM017025Rat #0.0642 : 0.009372, U13687Mus #0.0642 : 0.007057) #0.0642 : 0.017964, (((AF070995C#1 #0.2403 : 0.036625, (X04752Mus#1 #0.2403 : 0.023541, U07177Rat#1 #0.2403 : 0.036735)#1 #0.2403 : 0.049065)#1 #0.2403 : 0.041712, (U95378Sus#1 #0.2403 : 0.027095, U13680Hom#1 #0.2403 : 0.030135)#1 #0.2403 : 0.029127)#1 #0.2403 : 0.100022, (X53828OG1#2 #0.0949 : 0.028239, U28410OG2#2 #0.0949 : 0.035380)#2 #0.0949 : 0.031856) #0.0642 : 0.012087) #0.0642 : 0.001257) #0.0642 : 0.003953);


Time used:  1:10

In addition, carry out likelihood ratio tests (LRT) of the hypotheses below. See the lecture notes for additional details about LRTs. Use 1 degree of freedom to obtain the P-value for each LRT.

Summary table of hypotheses:
Model ω_A0 ω_A1 ω_C1 ω_C0 lnL LRT
H0 0.1367 0.1367 0.1367 0.1367 -6018.633
H1 0.1319 0.1319 0.1319 0.192 -6017.572
H2 0.0755 0.0755 0.2394 0.2394 -5985.635
H3 0.0949 0.0642 0.2403 0.2403 -5984.111

You should change the name of the main result file (via outfile= in the control file) or you will overwrite your previous results.

Exercise 4The objective of this exercise is to use a series of LRTs to test for sites evolving under positive selection in the nef gene. If you find significant evidence for positive selection, then identify the involved sites by using empirical Bayes methods.

       seqfile = seqfile.txt          * sequence data filename

    * treefile = treefile_M0.txt      * SET THIS for tree file with ML branch lengths under M0
    * treefile = treefile_M1.txt      * SET THIS for tree file with ML branch lengths under M1
    * treefile = treefile_M2.txt      * SET THIS for tree file with ML branch lengths under M2
    * treefile = treefile_M3.txt      * SET THIS for tree file with ML branch lengths under M3
    * treefile = treefile_M7.txt      * SET THIS for tree file with ML branch lengths under M7
    * treefile = treefile_M8.txt      * SET THIS for tree file with ML branch lengths under M8

      outfile = results.txt           * main result file name
        noisy = 9                     * lots of rubbish on the screen
      verbose = 1                     * detailed output
      runmode = 0                     * user defined tree
      seqtype = 1                     * codons
    CodonFreq = 2                     * F3X4 for codon ferquencies
        model = 0                     * one omega ratio for all branches

    * NSsites = 0                     * SET THIS for M0
    * NSsites = 1                     * SET THIS for M1
    * NSsites = 2                     * SET THIS for M2
    * NSsites = 3                     * SET THIS for M3
    * NSsites = 7                     * SET THIS for M7
    * NSsites = 8                     * SET THIS for M8

        icode = 0                     * universal code
    fix_kappa = 1                     * kappa fixed
      * kappa = 4.43491               * SET THIS to fix kappa at MLE under M0
      * kappa = 4.39117               * SET THIS to fix kappa at MLE under M1
      * kappa = 5.08964               * SET THIS to fix kappa at MLE under M2
      * kappa = 4.89033               * SET THIS to fix kappa at MLE under M3
      * kappa = 4.22750               * SET THIS to fix kappa at MLE under M7
      * kappa = 4.87827               * SET THIS to fix kappa at MLE under M8

    fix_omega = 0                     * omega to be estimated 
        omega = 5                     * initial omega

      * ncatG = 3                     * SET THIS for 3 site categories under M3         
      * ncatG = 10                    * SET THIS for 10 of site categories under M7 and M8

  fix_blength = 2                     * fixed branch lengths from tree file
sdsf