Thursday, July 23, 2015

PAML lab

Exercise 1:The objective of this activity is to use CODEML to evaluate the likelihood of the GstD1 sequences for a variety of ω values. Plot log-likelihood scores against the values of ω and determine the maximum likelihood estimate of ω. Check your finding by running CODEML’s hill-climbing algorithm.
#run this command to execute:

codeml ex1_codeml.ctl

This is what the control file looks like: 
      seqfile = seqfile.txt         * sequence data filename
      outfile = results_0.010.txt   * main result file name

        noisy = 9      * 0,1,2,3,9: how much rubbish on the screen
      verbose = 1      * 1:detailed output
      runmode = -2     * -2:pairwise

      seqtype = 1      * 1:codons
    CodonFreq = 3      * 0:equal, 1:F1X4, 2:F3X4, 3:F61
        model = 0      *
      NSsites = 0      *
        icode = 0      * 0:universal code

    fix_kappa = 0      * 1:kappa fixed, 0:kappa to be estimated
        kappa = 2      * initial or fixed kappa

    fix_omega = 1      * 1:omega fixed, 0:omega to be estimated
       * omega = 0.001  * 1st fixed omega value [change this]

       * EXCERCISE 1
       *alternate fixed omega values
       *omega = 0.005  * 2nd fixed value
       *omega = 0.01   * 3rd fixed value
       *omega = 0.05   * 4th fixed value
       *omega = 0.10   * 5th fixed value
       *omega = 0.20   * 6th fixed value
       *omega = 0.40   * 7th fixed value
       *omega = 0.80   * 8th fixed value
       *omega = 1.60   * 9th fixed value
       omega = 2.00   * 10th fixed value

The MLE of ω is 0.04.

Now we are going to have PAML estimate ω. In the control file, set fix_omega=0.
------------------
2 (Sim) ... 1 (Mel)
lnL = -756.568200
  0.12527  2.53119  0.06698


t= 0.1253  S=    45.2  N=   554.8  dN/dS= 0.0670  dN= 0.0204  dS= 0.3041
---------------------

Exercise 2: In this exercise you will investigate the sensitivity of your estimate of ω to the transition/transversion ratio (κ), and to the assumed model for codon frequencies (π’s). After you collect the required data you will determine which assumptions yield the largest and smallest values of S, and what effect it as on the value of ω.

Here is the control file:
      seqfile = seqfile.txt   * sequence data filename
      outfile = results.txt   * main result file name

        noisy = 9      * 0,1,2,3,9: how much rubbish on the screen
      verbose = 1      * 1:detailed output
      runmode = -2     * -2:pairwise

      seqtype = 1      * 1:codons
    CodonFreq = 0      * 0:equal, 1:F1X4, 2:F3X4, 3:F61          [change this]
        model = 0      *
      NSsites = 0      *
        icode = 0      * 0:universal code

    fix_kappa = 1      * 1:kappa fixed, 0:kappa to be estimated  [change this]
        kappa = 1      * initial or fixed kappa

    fix_omega = 0      * 1:omega fixed, 0:omega to be estimated
        omega = 0.5    * initial omega value


                       * EXCERCISE 2
                       * Codon bias = none; Ts/Tv bias = none
                       * Codon bias = none; Ts/Tv bias = Yes (ML)

                       * Codon bias = yes (F3x4); Ts/Tv bias = none
                       * Codon bias = yes (F3x4); Ts/Tv bias = Yes (ML)

                       * Codon bias = yes (F61); Ts/Tv bias = none

                       * Codon bias = yes (F61); Ts/Tv bias = Yes (ML)

Test#1  Codon bias = none; Ts/Tv bias = none
CodonFreq = 0      * 0:equal, 
fix_kappa = 1      * 1:kappa fixed,

2 (Sim) ... 1 (Mel)
lnL = -927.178200
  0.10683  0.27421


t= 0.1068  S=   152.9  N=   447.1  dN/dS= 0.2742  dN= 0.0213  dS= 0.0776
Test#2 Codon bias = none; Ts/Tv bias = Yes (ML)
*Highlighted is the ML estimate of of sequences kappa (ts/tv) 
CodonFreq = 0      * 0:equal, 
fix_kappa = 0      * 1:kappa fixed0:kappa to be estimated

2 (Sim) ... 1 (Mel)
lnL = -926.280300
  0.10533  1.88371  0.32004 


t= 0.1053  S=   165.8  N=   434.2  dN/dS= 0.3200  dN= 0.0221  dS= 0.0691

Test#3 Codon bias = yes (F3x4); Ts/Tv bias = none
CodonFreq = 2      * 2 F3X4, 
fix_kappa = 1      * 1:kappa fixed,

2 (Sim) ... 1 (Mel)
lnL = -844.507294
  0.10684  0.11807


t= 0.1068  S=    70.6  N=   529.4  dN/dS= 0.1181  dN= 0.0189  dS= 0.1605

Test#4 Codon bias = yes (F3x4); Ts/Tv bias = Yes (ML)
CodonFreq = 2      * 2 F3X4, 
fix_kappa = 0      * 0:kappa to be estimated

2 (Sim) ... 1 (Mel)
lnL = -842.214727
  0.10686  2.71034  0.12661


t= 0.1069  S=    73.4  N=   526.6  dN/dS= 0.1266  dN= 0.0193  dS= 0.1526

Test #5 Codon bias = yes (F61); Ts/Tv bias = none
CodonFreq = 3      * 3 F61

fix_kappa = 1      * 1:kappa fixed,

2 (Sim) ... 1 (Mel)
lnL = -758.551863
  0.12089  0.06278

t= 0.1209  S=    40.5  N=   559.5  dN/dS= 0.0628  dN= 0.0201  dS= 0.3198

Test #6 Codon bias = yes (F61); Ts/Tv bias = Yes (ML)
CodonFreq = 3      * 3 F61


fix_kappa = 0      * 0:kappa to be estimated
2 (Sim) ... 1 (Mel)
lnL = -756.568200
  0.12527  2.53121  0.06698


t= 0.1253  S=    45.2  N=   554.8  dN/dS= 0.0670  dN= 0.0204  dS= 0.3040

Summarized in a table


Assumptions K S N dS dN omega lnL
Fequal + k=1 1 152.9 447.1 0.021 0.078 0.27 -927.1782
Fequal + k=estimated 1.88 165.8 434.2 0.07 0.022 0.32 -926.2803
F3X4 + k=1 1 70.6 529.4 0.16 0.019 0.118 -844.507294
F3X4 + k=estimated 2.71 73.4 526.6 0.15 0.02 0.127 -842.214727
F61 + k=1 1 40.5 559.5 0.32 0.02 0.063 -758.551863
F61 + k=estimated 2.53 45.2 554.8 0.3 0.02 0.067 -756.5682

The assumption that yields the smallest value of S is F61 + k=1. It changes the value of omega by an order of magnitude.

Exercise 3: The objective of this exercise is to use three LRTs to evaluate the following hypotheses: (1) the mutation rate of Ldh-C has increased relative to Ldh-A, (2) a burst of positive selection for functional divergence occurred following the duplication event that gave rise to Ldh-C, and (3) there was a long term shift in selective constraints following the duplication event that gave rise to Ldh-C.

Here is the control file:
      seqfile = seqfile.txt   * sequence data filename
     treefile = treeH0.txt      * tree structure file name
      outfile = results.txt   * main result file name

        noisy = 9      * 0,1,2,3,9: how much rubbish on the screen
      verbose = 1      * 1:detailed output
      runmode = 0      * 0:user defined tree

      seqtype = 1      * 1:codons
    CodonFreq = 2      * 0:equal, 1:F1X4, 2:F3X4, 3:F61

        model = 0      * 0:one omega ratio for all branches
                       * 1:separate omega for each branch
                       * 2:user specified dN/dS ratios for branches

      NSsites = 0      *

        icode = 0      * 0:universal code

    fix_kappa = 0      * 1:kappa fixed, 0:kappa to be estimated
        kappa = 2      * initial or fixed kappa

    fix_omega = 0      * 1:omega fixed, 0:omega to be estimated
        omega = 0.2    * initial omega

                       * comments:
                       * H0 in Table 3: model = 0
                       * H1 in Table 3: model = 2
                       * H2 in Table 3: model = 2

                       * H3 in Table 3: model = 2

H0: homogeneous selection pressure over the tree
(X02152Hom,U07178Sus,(M22585rab,((NM017025Rat,U13687Mus),(((AF070995C,(X04752Mus,U07177Rat)),(U95378Sus,U13680Hom)),(X53828OG1,U28410OG2)))));


Here is the output (lnL highlighted in yellow):
TREE #  1:  (1, 2, (5, ((3, 4), (((6, (8, 9)), (7, 10)), (11, 12)))));   MP score: -1
lnL(ntime: 21  np: 23):  -6018.633010     +0.000000
  13..1    13..2    13..14   14..5    14..15   15..16   16..3    16..4    15..17   17..18   18..19   19..6    19..20   20..8    20..9    18..21   21..7    21..10   17..22   22..11   22..12 
 0.165986 0.145030 0.060255 0.252090 0.030239 0.230093 0.133935 0.101768 0.125411 0.692354 0.249468 0.205980 0.294515 0.131299 0.213038 0.174959 0.155890 0.172753 0.321380 0.290395 0.356759 2.401783 0.136656

Note: Branch length is defined as number of nucleotide substitutions per codon (not per neucleotide site).

tree length =   4.50360

(1: 0.165986, 2: 0.145030, (5: 0.252090, ((3: 0.133935, 4: 0.101768): 0.230093, (((6: 0.205980, (8: 0.131299, 9: 0.213038): 0.294515): 0.249468, (7: 0.155890, 10: 0.172753): 0.174959): 0.692354, (11: 0.290395, 12: 0.356759): 0.321380): 0.125411): 0.030239): 0.060255);

(X02152Hom: 0.165986, U07178Sus: 0.145030, (M22585rab: 0.252090, ((NM017025Rat: 0.133935, U13687Mus: 0.101768): 0.230093, (((AF070995C: 0.205980, (X04752Mus: 0.131299, U07177Rat: 0.213038): 0.294515): 0.249468, (U95378Sus: 0.155890, U13680Hom: 0.172753): 0.174959): 0.692354, (X53828OG1: 0.290395, U28410OG2: 0.356759): 0.321380): 0.125411): 0.030239): 0.060255);

Detailed output identifying parameters

kappa (ts/tv) =  2.40178

omega (dN/dS) =  0.13666

dN & dS for each branch

 branch           t        N        S    dN/dS       dN       dS   N*dN   S*dS

  13..1       0.166    729.3    269.7   0.1367   0.0205   0.1497   14.9   40.4
  13..2       0.145    729.3    269.7   0.1367   0.0179   0.1308   13.0   35.3
  13..14      0.060    729.3    269.7   0.1367   0.0074   0.0543    5.4   14.7
  14..5       0.252    729.3    269.7   0.1367   0.0311   0.2273   22.7   61.3
  14..15      0.030    729.3    269.7   0.1367   0.0037   0.0273    2.7    7.4
  15..16      0.230    729.3    269.7   0.1367   0.0283   0.2075   20.7   55.9
  16..3       0.134    729.3    269.7   0.1367   0.0165   0.1208   12.0   32.6
  16..4       0.102    729.3    269.7   0.1367   0.0125   0.0918    9.1   24.7
  15..17      0.125    729.3    269.7   0.1367   0.0155   0.1131   11.3   30.5
  17..18      0.692    729.3    269.7   0.1367   0.0853   0.6242   62.2  168.3
  18..19      0.249    729.3    269.7   0.1367   0.0307   0.2249   22.4   60.7
  19..6       0.206    729.3    269.7   0.1367   0.0254   0.1857   18.5   50.1
  19..20      0.295    729.3    269.7   0.1367   0.0363   0.2655   26.5   71.6
  20..8       0.131    729.3    269.7   0.1367   0.0162   0.1184   11.8   31.9
  20..9       0.213    729.3    269.7   0.1367   0.0262   0.1921   19.1   51.8
  18..21      0.175    729.3    269.7   0.1367   0.0216   0.1577   15.7   42.5
  21..7       0.156    729.3    269.7   0.1367   0.0192   0.1406   14.0   37.9
  21..10      0.173    729.3    269.7   0.1367   0.0213   0.1558   15.5   42.0
  17..22      0.321    729.3    269.7   0.1367   0.0396   0.2898   28.9   78.1
  22..11      0.290    729.3    269.7   0.1367   0.0358   0.2618   26.1   70.6
  22..12      0.357    729.3    269.7   0.1367   0.0440   0.3217   32.1   86.7

tree length for dN:      0.55489
tree length for dS:      4.06048

H1: episodic change in selection pressure in Ldh-C (only in the branch that immediately follows the gene duplication event).
Here is the tree file for this:
(X02152Hom,U07178Sus,(M22585rab,((NM017025Rat,U13687Mus),(((AF070995C,(X04752Mus,U07177Rat)),(U95378Sus,U13680Hom))#1,(X53828OG1,U28410OG2)))));

Output:
TREE #  1:  (1, 2, (5, ((3, 4), (((6, (8, 9)), (7, 10)), (11, 12)))));   MP score: -1
lnL(ntime: 21  np: 24):  -6017.572460     +0.000000
  13..1    13..2    13..14   14..5    14..15   15..16   16..3    16..4    15..17   17..18   18..19   19..6    19..20   20..8    20..9    18..21   21..7    21..10   17..22   22..11   22..12 
 0.166139 0.145654 0.060220 0.253769 0.027868 0.234240 0.134451 0.101620 0.136088 0.615641 0.250544 0.206999 0.295643 0.131224 0.213679 0.176223 0.155609 0.173597 0.319034 0.291976 0.357768 2.397123 0.131877 0.192033

Note: Branch length is defined as number of nucleotide substitutions per codon (not per nucleotide site).

tree length =   4.44798

(1: 0.166139, 2: 0.145654, (5: 0.253769, ((3: 0.134451, 4: 0.101620): 0.234240, (((6: 0.206999, (8: 0.131224, 9: 0.213679): 0.295643): 0.250544, (7: 0.155609, 10: 0.173597): 0.176223): 0.615641, (11: 0.291976, 12: 0.357768): 0.319034): 0.136088): 0.027868): 0.060220);

(X02152Hom: 0.166139, U07178Sus: 0.145654, (M22585rab: 0.253769, ((NM017025Rat: 0.134451, U13687Mus: 0.101620): 0.234240, (((AF070995C: 0.206999, (X04752Mus: 0.131224, U07177Rat: 0.213679): 0.295643): 0.250544, (U95378Sus: 0.155609, U13680Hom: 0.173597): 0.176223): 0.615641, (X53828OG1: 0.291976, U28410OG2: 0.357768): 0.319034): 0.136088): 0.027868): 0.060220);

Detailed output identifying parameters

kappa (ts/tv) =  2.39712

dN & dS for each branch

 branch           t        N        S    dN/dS       dN       dS   N*dN   S*dS

  13..1       0.166    729.4    269.6   0.1319   0.0199   0.1512   14.5   40.8
  13..2       0.146    729.4    269.6   0.1319   0.0175   0.1326   12.8   35.7
  13..14      0.060    729.4    269.6   0.1319   0.0072   0.0548    5.3   14.8
  14..5       0.254    729.4    269.6   0.1319   0.0305   0.2310   22.2   62.3
  14..15      0.028    729.4    269.6   0.1319   0.0033   0.0254    2.4    6.8
  15..16      0.234    729.4    269.6   0.1319   0.0281   0.2132   20.5   57.5
  16..3       0.134    729.4    269.6   0.1319   0.0161   0.1224   11.8   33.0
  16..4       0.102    729.4    269.6   0.1319   0.0122   0.0925    8.9   24.9
  15..17      0.136    729.4    269.6   0.1319   0.0163   0.1239   11.9   33.4
  17..18      0.616    729.4    269.6   0.1920   0.0961   0.5004   70.1  134.9
  18..19      0.251    729.4    269.6   0.1319   0.0301   0.2281   21.9   61.5
  19..6       0.207    729.4    269.6   0.1319   0.0249   0.1884   18.1   50.8
  19..20      0.296    729.4    269.6   0.1319   0.0355   0.2691   25.9   72.6
  20..8       0.131    729.4    269.6   0.1319   0.0158   0.1195   11.5   32.2
  20..9       0.214    729.4    269.6   0.1319   0.0257   0.1945   18.7   52.4
  18..21      0.176    729.4    269.6   0.1319   0.0212   0.1604   15.4   43.3
  21..7       0.156    729.4    269.6   0.1319   0.0187   0.1417   13.6   38.2
  21..10      0.174    729.4    269.6   0.1319   0.0208   0.1580   15.2   42.6
  17..22      0.319    729.4    269.6   0.1319   0.0383   0.2904   27.9   78.3
  22..11      0.292    729.4    269.6   0.1319   0.0351   0.2658   25.6   71.7
  22..12      0.358    729.4    269.6   0.1319   0.0430   0.3257   31.3   87.8

tree length for dN:      0.55619
tree length for dS:      3.98922

dS tree:
(X02152Hom: 0.151246, U07178Sus: 0.132597, (M22585rab: 0.231020, ((NM017025Rat: 0.122398, U13687Mus: 0.092510): 0.213241, (((AF070995C: 0.188443, (X04752Mus: 0.119460, U07177Rat: 0.194523): 0.269141): 0.228084, (U95378Sus: 0.141659, U13680Hom: 0.158035): 0.160426): 0.500425, (X53828OG1: 0.265802, U28410OG2: 0.325696): 0.290434): 0.123889): 0.025370): 0.054822);
dN tree:
(X02152Hom: 0.019946, U07178Sus: 0.017487, (M22585rab: 0.030466, ((NM017025Rat: 0.016142, U13687Mus: 0.012200): 0.028122, (((AF070995C: 0.024851, (X04752Mus: 0.015754, U07177Rat: 0.025653): 0.035494): 0.030079, (U95378Sus: 0.018682, U13680Hom: 0.020841): 0.021157): 0.096098, (X53828OG1: 0.035053, U28410OG2: 0.042952): 0.038302): 0.016338): 0.003346): 0.007230);

w ratios as labels for TreeView:
(X02152Hom #0.1319 : 0.019946, U07178Sus #0.1319 : 0.017487, (M22585rab #0.1319 : 0.030466, ((NM017025Rat #0.1319 : 0.016142, U13687Mus #0.1319 : 0.012200) #0.1319 : 0.028122, (((AF070995C #0.1319 : 0.024851, (X04752Mus #0.1319 : 0.015754, U07177Rat #0.1319 : 0.025653) #0.1319 : 0.035494) #0.1319 : 0.030079, (U95378Sus #0.1319 : 0.018682, U13680Hom #0.1319 : 0.020841) #0.1319 : 0.021157)#1 #0.1920 : 0.096098, (X53828OG1 #0.1319 : 0.035053, U28410OG2 #0.1319 : 0.042952) #0.1319 : 0.038302) #0.1319 : 0.016338) #0.1319 : 0.003346) #0.1319 : 0.007230);

H2: Long term shift in selection pressure in Ldh-C only; Ldh-C has a permanent change in selection pressure (as compared to its ancestors) whereas Ldh-A remains subject to the ancestral level of selection pressure.
(X02152Hom,U07178Sus,(M22585rab,((NM017025Rat,U13687Mus),(((AF070995C #1,(X04752Mus #1,U07177Rat #1)#1)#1,(U95378Sus #1,U13680Hom #1)#1)#1,(X53828OG1,U28410OG2)))));

Output:
TREE #  1:  (1, 2, (5, ((3, 4), (((6, (8, 9)), (7, 10)), (11, 12)))));   MP score: -1
lnL(ntime: 21  np: 24):  -5985.635237     +0.000000
  13..1    13..2    13..14   14..5    14..15   15..16   16..3    16..4    15..17   17..18   18..19   19..6    19..20   20..8    20..9    18..21   21..7    21..10   17..22   22..11   22..12 
 0.171224 0.151409 0.058980 0.269203 0.017888 0.262039 0.138108 0.103579 0.168914 0.560187 0.231662 0.203731 0.272797 0.130747 0.204423 0.162430 0.151155 0.167095 0.366793 0.312139 0.396584 2.403414 0.075516 0.239425

Note: Branch length is defined as number of nucleotide substitutions per codon (not per neucleotide site).

tree length =   4.50108

(1: 0.171224, 2: 0.151409, (5: 0.269203, ((3: 0.138108, 4: 0.103579): 0.262039, (((6: 0.203731, (8: 0.130747, 9: 0.204423): 0.272797): 0.231662, (7: 0.151155, 10: 0.167095): 0.162430): 0.560187, (11: 0.312139, 12: 0.396584): 0.366793): 0.168914): 0.017888): 0.058980);

(X02152Hom: 0.171224, U07178Sus: 0.151409, (M22585rab: 0.269203, ((NM017025Rat: 0.138108, U13687Mus: 0.103579): 0.262039, (((AF070995C: 0.203731, (X04752Mus: 0.130747, U07177Rat: 0.204423): 0.272797): 0.231662, (U95378Sus: 0.151155, U13680Hom: 0.167095): 0.162430): 0.560187, (X53828OG1: 0.312139, U28410OG2: 0.396584): 0.366793): 0.168914): 0.017888): 0.058980);

Detailed output identifying parameters

kappa (ts/tv) =  2.40341

dN & dS for each branch

 branch           t        N        S    dN/dS       dN       dS   N*dN   S*dS

  13..1       0.171    729.3    269.7   0.0755   0.0133   0.1756    9.7   47.3
  13..2       0.151    729.3    269.7   0.0755   0.0117   0.1552    8.5   41.9
  13..14      0.059    729.3    269.7   0.0755   0.0046   0.0605    3.3   16.3
  14..5       0.269    729.3    269.7   0.0755   0.0208   0.2760   15.2   74.4
  14..15      0.018    729.3    269.7   0.0755   0.0014   0.0183    1.0    4.9
  15..16      0.262    729.3    269.7   0.0755   0.0203   0.2687   14.8   72.5
  16..3       0.138    729.3    269.7   0.0755   0.0107   0.1416    7.8   38.2
  16..4       0.104    729.3    269.7   0.0755   0.0080   0.1062    5.8   28.6
  15..17      0.169    729.3    269.7   0.0755   0.0131   0.1732    9.5   46.7
  17..18      0.560    729.3    269.7   0.2394   0.1005   0.4198   73.3  113.2
  18..19      0.232    729.3    269.7   0.2394   0.0416   0.1736   30.3   46.8
  19..6       0.204    729.3    269.7   0.2394   0.0366   0.1527   26.7   41.2
  19..20      0.273    729.3    269.7   0.2394   0.0490   0.2045   35.7   55.1
  20..8       0.131    729.3    269.7   0.2394   0.0235   0.0980   17.1   26.4
  20..9       0.204    729.3    269.7   0.2394   0.0367   0.1532   26.8   41.3
  18..21      0.162    729.3    269.7   0.2394   0.0291   0.1217   21.3   32.8
  21..7       0.151    729.3    269.7   0.2394   0.0271   0.1133   19.8   30.6
  21..10      0.167    729.3    269.7   0.2394   0.0300   0.1252   21.9   33.8
  17..22      0.367    729.3    269.7   0.0755   0.0284   0.3761   20.7  101.4
  22..11      0.312    729.3    269.7   0.0755   0.0242   0.3200   17.6   86.3
  22..12      0.397    729.3    269.7   0.0755   0.0307   0.4066   22.4  109.7

tree length for dN:      0.56113
tree length for dS:      4.04015

dS tree:
(X02152Hom: 0.175561, U07178Sus: 0.155244, (M22585rab: 0.276022, ((NM017025Rat: 0.141606, U13687Mus: 0.106203): 0.268676, (((AF070995C: 0.152691, (X04752Mus: 0.097991, U07177Rat: 0.153209): 0.204454): 0.173624, (U95378Sus: 0.113287, U13680Hom: 0.125233): 0.121737): 0.419845, (X53828OG1: 0.320046, U28410OG2: 0.406630): 0.376085): 0.173193): 0.018341): 0.060474);
dN tree:
(X02152Hom: 0.013258, U07178Sus: 0.011723, (M22585rab: 0.020844, ((NM017025Rat: 0.010693, U13687Mus: 0.008020): 0.020289, (((AF070995C: 0.036558, (X04752Mus: 0.023462, U07177Rat: 0.036682): 0.048951): 0.041570, (U95378Sus: 0.027124, U13680Hom: 0.029984): 0.029147): 0.100521, (X53828OG1: 0.024169, U28410OG2: 0.030707): 0.028400): 0.013079): 0.001385): 0.004567);

w ratios as labels for TreeView:
(X02152Hom #0.0755 : 0.013258, U07178Sus #0.0755 : 0.011723, (M22585rab #0.0755 : 0.020844, ((NM017025Rat #0.0755 : 0.010693, U13687Mus #0.0755 : 0.008020) #0.0755 : 0.020289, (((AF070995C#1 #0.2394 : 0.036558, (X04752Mus#1 #0.2394 : 0.023462, U07177Rat#1 #0.2394 : 0.036682)#1 #0.2394 : 0.048951)#1 #0.2394 : 0.041570, (U95378Sus#1 #0.2394 : 0.027124, U13680Hom#1 #0.2394 : 0.029984)#1 #0.2394 : 0.029147)#1 #0.2394 : 0.100521, (X53828OG1 #0.0755 : 0.024169, U28410OG2 #0.0755 : 0.030707) #0.0755 : 0.028400) #0.0755 : 0.013079) #0.0755 : 0.001385) #0.0755 : 0.004567);


Time used:  0:56
H3: Long term shift in selection on both Ldh-C and Ldh-A; those lineages are subject to selection pressures different from each other and from the ancestor.
Tree file:
(X02152Hom,U07178Sus,(M22585rab,((NM017025Rat,U13687Mus),(((AF070995C #1,(X04752Mus #1,U07177Rat #1)#1)#1,(U95378Sus #1,U13680Hom #1)#1)#1,(X53828OG1 #2,U28410OG2 #2)#2))));

Output:
TREE #  1:  (1, 2, (5, ((3, 4), (((6, (8, 9)), (7, 10)), (11, 12)))));   MP score: -1
lnL(ntime: 21  np: 25):  -5984.111015     +0.000000
  13..1    13..2    13..14   14..5    14..15   15..16   16..3    16..4    15..17   17..18   18..19   19..6    19..20   20..8    20..9    18..21   21..7    21..10   17..22   22..11   22..12 
 0.173367 0.152089 0.058499 0.273176 0.018609 0.265853 0.138699 0.104431 0.178874 0.556050 0.231889 0.203607 0.272766 0.130870 0.204220 0.161922 0.150629 0.167529 0.341576 0.302793 0.379370 2.399088 0.064218 0.240341 0.094898

Note: Branch length is defined as number of nucleotide substitutions per codon (not per neucleotide site).

tree length =   4.46682

(1: 0.173367, 2: 0.152089, (5: 0.273176, ((3: 0.138699, 4: 0.104431): 0.265853, (((6: 0.203607, (8: 0.130870, 9: 0.204220): 0.272766): 0.231889, (7: 0.150629, 10: 0.167529): 0.161922): 0.556050, (11: 0.302793, 12: 0.379370): 0.341576): 0.178874): 0.018609): 0.058499);

(X02152Hom: 0.173367, U07178Sus: 0.152089, (M22585rab: 0.273176, ((NM017025Rat: 0.138699, U13687Mus: 0.104431): 0.265853, (((AF070995C: 0.203607, (X04752Mus: 0.130870, U07177Rat: 0.204220): 0.272766): 0.231889, (U95378Sus: 0.150629, U13680Hom: 0.167529): 0.161922): 0.556050, (X53828OG1: 0.302793, U28410OG2: 0.379370): 0.341576): 0.178874): 0.018609): 0.058499);

Detailed output identifying parameters

kappa (ts/tv) =  2.39909

dN & dS for each branch

 branch           t        N        S    dN/dS       dN       dS   N*dN   S*dS

  13..1       0.173    729.4    269.6   0.0642   0.0117   0.1824    8.5   49.2
  13..2       0.152    729.4    269.6   0.0642   0.0103   0.1600    7.5   43.1
  13..14      0.058    729.4    269.6   0.0642   0.0040   0.0616    2.9   16.6
  14..5       0.273    729.4    269.6   0.0642   0.0185   0.2874   13.5   77.5
  14..15      0.019    729.4    269.6   0.0642   0.0013   0.0196    0.9    5.3
  15..16      0.266    729.4    269.6   0.0642   0.0180   0.2797   13.1   75.4
  16..3       0.139    729.4    269.6   0.0642   0.0094   0.1459    6.8   39.4
  16..4       0.104    729.4    269.6   0.0642   0.0071   0.1099    5.1   29.6
  15..17      0.179    729.4    269.6   0.0642   0.0121   0.1882    8.8   50.7
  17..18      0.556    729.4    269.6   0.2403   0.1000   0.4162   73.0  112.2
  18..19      0.232    729.4    269.6   0.2403   0.0417   0.1736   30.4   46.8
  19..6       0.204    729.4    269.6   0.2403   0.0366   0.1524   26.7   41.1
  19..20      0.273    729.4    269.6   0.2403   0.0491   0.2041   35.8   55.0
  20..8       0.131    729.4    269.6   0.2403   0.0235   0.0979   17.2   26.4
  20..9       0.204    729.4    269.6   0.2403   0.0367   0.1528   26.8   41.2
  18..21      0.162    729.4    269.6   0.2403   0.0291   0.1212   21.2   32.7
  21..7       0.151    729.4    269.6   0.2403   0.0271   0.1127   19.8   30.4
  21..10      0.168    729.4    269.6   0.2403   0.0301   0.1254   22.0   33.8
  17..22      0.342    729.4    269.6   0.0949   0.0319   0.3357   23.2   90.5
  22..11      0.303    729.4    269.6   0.0949   0.0282   0.2976   20.6   80.2
  22..12      0.379    729.4    269.6   0.0949   0.0354   0.3728   25.8  100.5

tree length for dN:      0.56167
tree length for dS:      3.99725

dS tree:
(X02152Hom: 0.182422, U07178Sus: 0.160033, (M22585rab: 0.287445, ((NM017025Rat: 0.145944, U13687Mus: 0.109885): 0.279740, (((AF070995C: 0.152387, (X04752Mus: 0.097947, U07177Rat: 0.152845): 0.204148): 0.173554, (U95378Sus: 0.112736, U13680Hom: 0.125384): 0.121188): 0.416167, (X53828OG1: 0.297568, U28410OG2: 0.372824): 0.335682): 0.188217): 0.019581): 0.061554);
dN tree:
(X02152Hom: 0.011715, U07178Sus: 0.010277, (M22585rab: 0.018459, ((NM017025Rat: 0.009372, U13687Mus: 0.007057): 0.017964, (((AF070995C: 0.036625, (X04752Mus: 0.023541, U07177Rat: 0.036735): 0.049065): 0.041712, (U95378Sus: 0.027095, U13680Hom: 0.030135): 0.029127): 0.100022, (X53828OG1: 0.028239, U28410OG2: 0.035380): 0.031856): 0.012087): 0.001257): 0.003953);

w ratios as labels for TreeView:
(X02152Hom #0.0642 : 0.011715, U07178Sus #0.0642 : 0.010277, (M22585rab #0.0642 : 0.018459, ((NM017025Rat #0.0642 : 0.009372, U13687Mus #0.0642 : 0.007057) #0.0642 : 0.017964, (((AF070995C#1 #0.2403 : 0.036625, (X04752Mus#1 #0.2403 : 0.023541, U07177Rat#1 #0.2403 : 0.036735)#1 #0.2403 : 0.049065)#1 #0.2403 : 0.041712, (U95378Sus#1 #0.2403 : 0.027095, U13680Hom#1 #0.2403 : 0.030135)#1 #0.2403 : 0.029127)#1 #0.2403 : 0.100022, (X53828OG1#2 #0.0949 : 0.028239, U28410OG2#2 #0.0949 : 0.035380)#2 #0.0949 : 0.031856) #0.0642 : 0.012087) #0.0642 : 0.001257) #0.0642 : 0.003953);


Time used:  1:10

In addition, carry out likelihood ratio tests (LRT) of the hypotheses below. See the lecture notes for additional details about LRTs. Use 1 degree of freedom to obtain the P-value for each LRT.

Summary table of hypotheses:
Model ω_A0 ω_A1 ω_C1 ω_C0 lnL LRT
H0 0.1367 0.1367 0.1367 0.1367 -6018.633
H1 0.1319 0.1319 0.1319 0.192 -6017.572
H2 0.0755 0.0755 0.2394 0.2394 -5985.635
H3 0.0949 0.0642 0.2403 0.2403 -5984.111

You should change the name of the main result file (via outfile= in the control file) or you will overwrite your previous results.

Exercise 4The objective of this exercise is to use a series of LRTs to test for sites evolving under positive selection in the nef gene. If you find significant evidence for positive selection, then identify the involved sites by using empirical Bayes methods.

       seqfile = seqfile.txt          * sequence data filename

    * treefile = treefile_M0.txt      * SET THIS for tree file with ML branch lengths under M0
    * treefile = treefile_M1.txt      * SET THIS for tree file with ML branch lengths under M1
    * treefile = treefile_M2.txt      * SET THIS for tree file with ML branch lengths under M2
    * treefile = treefile_M3.txt      * SET THIS for tree file with ML branch lengths under M3
    * treefile = treefile_M7.txt      * SET THIS for tree file with ML branch lengths under M7
    * treefile = treefile_M8.txt      * SET THIS for tree file with ML branch lengths under M8

      outfile = results.txt           * main result file name
        noisy = 9                     * lots of rubbish on the screen
      verbose = 1                     * detailed output
      runmode = 0                     * user defined tree
      seqtype = 1                     * codons
    CodonFreq = 2                     * F3X4 for codon ferquencies
        model = 0                     * one omega ratio for all branches

    * NSsites = 0                     * SET THIS for M0
    * NSsites = 1                     * SET THIS for M1
    * NSsites = 2                     * SET THIS for M2
    * NSsites = 3                     * SET THIS for M3
    * NSsites = 7                     * SET THIS for M7
    * NSsites = 8                     * SET THIS for M8

        icode = 0                     * universal code
    fix_kappa = 1                     * kappa fixed
      * kappa = 4.43491               * SET THIS to fix kappa at MLE under M0
      * kappa = 4.39117               * SET THIS to fix kappa at MLE under M1
      * kappa = 5.08964               * SET THIS to fix kappa at MLE under M2
      * kappa = 4.89033               * SET THIS to fix kappa at MLE under M3
      * kappa = 4.22750               * SET THIS to fix kappa at MLE under M7
      * kappa = 4.87827               * SET THIS to fix kappa at MLE under M8

    fix_omega = 0                     * omega to be estimated 
        omega = 5                     * initial omega

      * ncatG = 3                     * SET THIS for 3 site categories under M3         
      * ncatG = 10                    * SET THIS for 10 of site categories under M7 and M8

  fix_blength = 2                     * fixed branch lengths from tree file
sdsf
PAUP* advice from David Swafford:
-------------------------
Things that you would examine with your own analyses:
The information that you really want from the program are the best trees found in each search replicate and the globally best across all replicates. After each individual replicate finishes, the best trees from all of the replicates completed thus far are written to the .best.all.tre file. When all replicates have finished, the best tree across all replicates is written to the .best.tre file.
The config files used here are set up to use a feature of the program that collapses internal branches that have an MLE length of zero. This may result in final trees that have polytomies. This is generally the behavior that one would want. Note that the likelihoods of the trees will be identical whether or not the branches are collapsed.
When the two search replicates have completed, we can more closely examine the results.
First, take a look at the end of the .screen.log file. You will see a report of the scores of the final tree from each search replicate, an indication of whether they are the same topology, and a table comparing the parameter values estimated on each final tree.
There are two possibilities:
  1. The search replicates found the same best tree. You should see essentially identical lnL values and parameter estimates. The screen.log file should indicate that the trees are identical.
  2. The search replicates found two different trees. This is almost certainly because one or both were trapped in local topological optima. You will notice that the lnL values are somewhat different and the parameter estimates will be similar but not identical. The search settings may influence whether searches are entrapped in local optima, but generally the default settings are appropriate.
We can also evaluate and compare the results of our two search replicates in PAUP*. Being able to open the results of one program in another for further analysis is a good skill to have.

Monday, July 20, 2015

*Most of this text is from the MBL tutorial for MSA. I am recording what I find and important points, in case the site gets lost over time.

Alignments with nucleotides: MUSCLE V. mafft
mafft --retree 2 1ped.fasta > mafft_dna.fasta

muscle -verbose -log muscle_dna.log -in 1ped.fasta -out muscle_dna.fasta

Compare the alignments resulting from A steps and B steps. Are they different? How many columns are in each the MAFFT or the MUSCLE alignment? What may be wrong with both? (Hint: these are protein coding genes).
They were not well-aligned, there were indels that were not multiples of three and they are protein coding genes!
Do the topologies and/or branch lengths differ? (Hint: look up some species names to get an idea of the expected topology!) Yes--very much so.


Bottom line: Don't do nucleotide alignments on protein-based sequences!

Comparing what happens with gaps:

muscle -verbose -log muscle_gap-20.log -in 1ped_aa.fasta -out muscle_aa_gap-20.fasta -gapopen -20
muscle -verbose -log muscle_gap-1.log -in 1ped_aa.fasta -out muscle_aa_gap-1.fasta -gapopen -1

Compare the modified gap penalty alignments to the default one. How do they differ? Which has the most gaps? Can you guess the order of magnitude of the default gap penalty? Can you find in the log files of MUSCLE the gap penalty use?
The higher gap penalty makes for a much more compressed alignment and helps to take care of this over "indel" issue. The default for MUSCLE is -2.9.


Alignment extension

   papara -t example.tre -s aln.phy -q sequence1.fasta -n run1


   papara -t example.tre -s aln.phy -q sequence2.fasta -n run2


How does the alignment look? Any concerns? How does the requirement for testing for sequence homology before site alignment apply here?
There are big tails at the ends of the new sequence. The alignment may have been trimmed previously, or the previous group used a different set of primers. The middle part is probably homologous, but a lot of times the ends may not line up perfectly. Always check your alignment.

   pagan --ref-seqfile aln.fasta -t example.tre -q sequence1.fasta -o pagan1

  pagan --ref-seqfile aln.fasta -t example.tre -q sequence2.fasta -o pagan2

Pagan requires input alignments in fasta rather than phylip format, but we have provided that in aln.fasta Try typing only 'pagan' to see the full explanation of parameters Compare the pagan alignments for the two sequences you downloaded.
What happens when you try to align the second sequence? HINT: Look at the output to screen. Why?
Adding the second sequence fails. Pagan is helping us to prevent alignments of completely random sequence! Thanks Pagan!

Both papara and pagan can align short reads as well as full length sequences, but caution about homology is particularly appropriate in these cases.

Codon Alignments

This is not part of the exercises, it's just for your future information.
As you now know, it is not appropriate to align nucleotides of protein coding regions. In the exercises above, you translated the nucleotides to amino acids which you could use to infer trees. But sometimes you want to analyze nucleotides that have been aligned by codon. So how do you go from an amino acid alignment back to codon-aligned nucleotides? We suggest the Pal2Nal server. You will upload your protein alignment and nucleotide sequences, and it will spit out the codon alignment. Please be aware that your nucleotides must be multiples of three (i.e. a full open reading frame).

FASTA tutorial

Identifying homologs and non-homologs; effects of scoring matrices and algorithms; using domain annotations

1. Use the FASTA search page [pgm] to compare Honey bee glutathione transferase D1 NP_001171499/ H9KLY5_APIME [seq] (gi|295842263) to the PIR1 Annotated protein sequence database. Be sure to press, not .
  1. Take a look at the output.
    1. How long is the query sequence?
      1. 217 amino acids
    2. How many sequences are in the PIR1 database?
      1. 13,143 sequences
    3. What scoring matrix was used?
      1. BL50 matrix
    4. What were the gap penalties? (what is the penalty for a one-residue gap? two residues?)
      1. one residue: -12 open gap, the -14 for two residues
    5. What are each of the numbers after the description of the library sequence? Which one is best for inferring homology?
      1. sp|P20432|GSTT1_DROME Glutathione S-transferase 1-1; DD ( 209)
      2. 209 is referring to the length of sequence that it matches
    6. How similar is the highest scoring sequence? What is the difference between %_id and %_sim? Why is there no 100% identity match?
      1. 81% similar,  but 57% identity
      2. The sequence is not in the database
    7. Looking at an alignment, where are the boundaries of the alignment (the best local region)? How many gaps are in the best alignment? The second best?
      1. Local alignment, score gets worse if you add more residue so important to consider
      2. There are no gaps in the first, there are 14 in the second
  2. What is the highest scoring non-homolog? (The non-homolog with the highest alignment score, or the lowest E()-value.) How would you confirm that your candidate non-homolog was truly unrelated? (Hint - compare your candidate non-homolog with SwissProt for a more comprehensive test.)
    1. Highest scoring unrelated sequence should have an E() value of 1 (or around 1). Can do "general re-search of this sequence and make sure it does not find your originally queried value. 
  3. Homology (and non-homology) can also be inferred by domain relationships. Try the same Honey bee GSTD1 search [pgm] search using the Annotate Query and Database Annotations: set to show .
    1. Does the domain display change your mind about the highest scoring non-homolog?
      1. Results with E() of 0.11 show completely different shared domains and some with an even higher E() show a shared domain. Often say NODOM Q-values <30 have no domain.
    2. There are three parts to the domain display, the domain structure of the query (top) sequence (if available), the domain structure of the library (bottom) sequence, and the domain alignment boundaries in the middle (inside the alignment box). The boundaries and color of the alignment domain coloring match the Region: sub-alignment scores.
      1. Based on the score, the second part of the domain is not significant. Sometimes the homolog is "too far away" (evolutionary distance) to be given a high score.  
      2. Domains have characteristic length (about 100 residues). Look to see if there is room for a domain, if it is not contributing to the score, it could just be too far away.
      3. Use transitivity from other sequences to make decision it is there.
    3. Note that the alignment of Honey bee GSTD1 and SSPA_ECO57 includes portions of both the N-terminal and C-terminal domains, but neither domain is completely aligned. Why do you think the alignments do not include the complete domains?
    4. Is your explanation for the partial domain alignment consistent the the argument that domains have a characteristic length? How might you test whether a complete domain is present?
    5. The FASTA programs can partition an alignment score based domain boundaries, and report the amount of score associated with a domain. In the SSPA_ECO57 alignment, how much of the score comes from the GST-N terminal domain? The GST-C terminal domain? Does this alignment provide strong support for the presence of a GST-C terminal domain? How might you test for the presence of the domain on GSTD1? On SSPA_ECO57?In the subalignment scores, the Q value is -10 * log(p) for the sub-alignment score, so Q=30.0 means p < 0.001.
  4. Repeat the GSTD1 search [pgm] using the BLASTP62/-11/-1 scoring matrix  that BLAST uses. Re-examine the GSTD1:SSPA_ECO57 alignment. Are both Glutathione transferase domains present? Look at the alignments to the homologs above and below SSPA_ECO57. Based on those aligments, do you think the Glutathione-S-Trfase C-like domain is really missing? Why did the alignment become so much shorter?
    1. SKIP
  5. One of the candidate non-homologs is sp|Q9SI20|EF1D2_ARATH, with an E()-value of 0.11.
    1. Does the domain structure of EF1D2_ARATH suggest that it could be a glutatione transferase homolog?
      1. No, it looks very different
      2. Only residues 8-60 overlapped
    2. Use the General Research to explore the domains contained in EF1D2_ARATH homologs found in SwissProt.
    3. Does this secondary search support homology or non-homology?

      1. Focusing on the NODOM, it actually just isn't annotated! Scorlling down you can see that a glutathion homolog eventaully does show up in the annotation!!



Trust annotations when they say something, but don't trust them if they don't say something. 

2. Exploring domains and alignment over-extension -- cortactin (SRC8_HUMAN)
Compare SRC8_HUMAN [pgm] (human cortactin) to the SwissProt protein sequence database.

  1. Looking at the top five alignments, how many cortactin orthologs do you see? (ortholog, same protein, different species).
    1. Four orthologs
  2. In the SRC8 HUMAN:CHICK alignment, both the query and the subject (library) sequences align seven cortactin domains and an SH3 domain. In addition, two regions (one before the cortactin domain cluster and one after) are well conserved, but do not have annotated domains (NODOM). Are these non-domain (NODOM) regions as well conserved as the annotated domains?

  3. Look at the SRC8_HUMAN:HCLS1_MOUSE alignment. How many cortactin domains does HCLS1_MOUSE contain? How much score does the NODOM spanning the region between cortactin domains and the SH3 domain contribute? Why is it included in the alignment? Is it likely to be homologous?
    1. It is a homolog, but not an ortholog--it is a paralog! We already found mouse domain; only 4 shared domain too. 
  4. Is the NODOM between the cortactin domains and the SH3 domain likely to be homologous in the SRC8_HUMAN:DBNLB_XENLA alignment?
    1. No. The domain score of the second, non annotated domain has a poor score (even negative!). Still needs to be included to include the SH3 domain. No SH3 domain, the alignment would stop.
  5. In the SRC8_HUMAN:LASP1_HUMAN alignment, the alignment extends to include several Nebulin_repeat domains. Do you think there is a Nebulin_repeat domain in SRC8_HUMAN? Why do you think those domains are aligning?
  6. What scoring matrix should be used to reduce over-extension from the SH3 domain?

3. Exploring domains and over-extension with local alignments -- death associated protein kinase (DAPK1_HUMAN)
  1. Look up the domain structure of DAPK1_HUMAN at Pfam [pgm].
    1. What are the major (PfamA) domain regions on the protein?
    2. Which of the domains is repeated?
    3. In a local (LALIGN) alignment, where would you expect to see overlapping domains like those in Calmodulin (CALM_HUMAN) and Cortactin (DAPK1_HUMAN)?
  2. Use lalign/plalign [pgm] to examine local similarities between DAPK1_HUMAN and itself. Check the options to "annotate sequence 1 domains" and "annotate sequence 2 domains". Do you see the domains you expected from Pfam? Do they map in the same places?


Saturday, July 18, 2015

After running GARLI, I wanted to compare
OR137:
OR213: lots of disagreements; redo with 100
OR4: lots of disagreements; redo with 100
OR589: lot
OR6: all agree after 8 runs
OR10: all agree after 8 runs
OR11: tree2 disagrees, 7 others agree, use this as main gene tree
OR51: lots of disagreements; redo with 100
OR52: lots of disagreements; redo with 100

100 is a lot of tree to compare, so I wrote a small script doing so:

setwd("/Volumes/Yango/Hayden_batORs/garli/OR51/trees")

#tell where to open up all of your downloaded files from CIPRES
my.files<-list.files("/Volumes/Yango/Hayden_batORs/garli/OR51/trees/")
#read in all of these trees to a list using read.nexus function
my.trees<-lapply(my.files, read.nexus)

#create a matrix to store all of the tree comparisons
tree.matrix<-matrix(nrow = length(my.trees), ncol = length(my.trees))

for(i in 1:length(my.trees)){ #for every tree in the file
  this.tree<-as.phylo(my.trees[[i]]) #look at that tree
  #compare the topology to all the other trees
  tree.matrix[i,]<-sapply(my.trees, function(my.trees) dist.topo(this.tree,my.trees))
}
tree.matrix

#print out the ones that agree with each other
for(i in 1:length(tree.matrix[1,])){
  this.list<-as.numeric(which(tree.matrix[i,]==0))
  print(this.list)
}
write.csv(tree.matrix, "tree.matrix.csv")

For OR51, are things looking better?
[1] 1
[1] 2
[1] 3
[1]  4 84
[1] 5
[1] 6
[1] 7
[1]  8 15 41 79
[1]  9 14
[1] 10 54
[1] 11
[1] 12
[1] 13 73
[1]  9 14
[1]  8 15 41 79
[1] 16
[1] 17 20 75
[1] 18
[1] 19
[1] 17 20 75
[1] 21 51 68
[1] 22 37
[1] 23 50 71
[1] 24
[1] 25
[1] 26
[1] 27
[1] 28 69 88
[1] 29
[1] 30
[1] 31
[1] 32 46
[1] 33
[1] 34 70
[1] 35 53
[1] 36
[1] 22 37
[1] 38 59
[1] 39 47 78
[1] 40
[1]  8 15 41 79
[1] 42
[1] 43
[1] 44
[1] 45
[1] 32 46
[1] 39 47 78
[1] 48
[1] 49
[1] 23 50 71
[1] 21 51 68
[1] 52
[1] 35 53
[1] 10 54
[1] 55
[1] 56
[1] 57
[1] 58
[1] 38 59
[1] 60
[1] 61
[1] 62 86 87
[1] 63
[1] 64
[1] 65
[1] 66 95
[1] 67
[1] 21 51 68
[1] 28 69 88
[1] 34 70
[1] 23 50 71
[1] 72
[1] 13 73
[1] 74
[1] 17 20 75
[1] 76
[1] 77
[1] 39 47 78
[1]  8 15 41 79
[1] 80
[1] 81
[1] 82
[1] 83
[1]  4 84
[1] 85
[1] 62 86 87
[1] 62 86 87
[1] 28 69 88
[1] 89
[1] 90
[1] 91
[1] 92
[1] 93
[1] 94
[1] 66 95
[1] 96
[1] 97
[1] 98

Womp, womp...not really--only a maximum of 4 trees out of 100 are agreeing.

OR4s looking much better:
[1]  1 22 28 42 75
[1]  2 99
[1]  3 14 18 83 89
 [1]  4 12 17 20 23 24 27 51 58 59 72 77 79 88 96
[1] 5
[1] 6
[1]  7 15 98
[1] 8
[1]  9 19 38 80
[1] 10 50
[1] 11 67
 [1]  4 12 17 20 23 24 27 51 58 59 72 77 79 88 96
[1] 13 86
[1]  3 14 18 83 89
[1]  7 15 98
[1] 16
 [1]  4 12 17 20 23 24 27 51 58 59 72 77 79 88 96
[1]  3 14 18 83 89
[1]  9 19 38 80
 [1]  4 12 17 20 23 24 27 51 58 59 72 77 79 88 96
[1] 21 31 44 62 84
[1]  1 22 28 42 75
 [1]  4 12 17 20 23 24 27 51 58 59 72 77 79 88 96
 [1]  4 12 17 20 23 24 27 51 58 59 72 77 79 88 96
[1] 25
[1] 26
 [1]  4 12 17 20 23 24 27 51 58 59 72 77 79 88 96
[1]  1 22 28 42 75
[1] 29
[1] 30 33
[1] 21 31 44 62 84
[1] 32 37 61 68 81
[1] 30 33
[1]  34 100
[1] 35
[1] 36
[1] 32 37 61 68 81
[1]  9 19 38 80
[1] 39 71
[1] 40 49
[1] 41
[1]  1 22 28 42 75
[1] 43
[1] 21 31 44 62 84
[1] 45
[1] 46
[1] 47
[1] 48
[1] 40 49
[1] 10 50
 [1]  4 12 17 20 23 24 27 51 58 59 72 77 79 88 96
[1] 52
[1] 53
[1] 54
[1] 55
[1] 56 90
[1] 57
 [1]  4 12 17 20 23 24 27 51 58 59 72 77 79 88 96
 [1]  4 12 17 20 23 24 27 51 58 59 72 77 79 88 96
[1] 60
[1] 32 37 61 68 81
[1] 21 31 44 62 84
[1] 63
[1] 64
[1] 65
[1] 66
[1] 11 67
[1] 32 37 61 68 81
[1] 69
[1] 70
[1] 39 71
 [1]  4 12 17 20 23 24 27 51 58 59 72 77 79 88 96
[1] 73
[1] 74
[1]  1 22 28 42 75
[1] 76
 [1]  4 12 17 20 23 24 27 51 58 59 72 77 79 88 96
[1] 78
 [1]  4 12 17 20 23 24 27 51 58 59 72 77 79 88 96
[1]  9 19 38 80
[1] 32 37 61 68 81
[1] 82
[1]  3 14 18 83 89
[1] 21 31 44 62 84
[1] 85
[1] 13 86
[1] 87
 [1]  4 12 17 20 23 24 27 51 58 59 72 77 79 88 96
[1]  3 14 18 83 89
[1] 56 90
[1] 91
[1] 92
[1] 93
[1] 94
[1] 95
 [1]  4 12 17 20 23 24 27 51 58 59 72 77 79 88 96
[1] 97
[1]  7 15 98
[1]  2 99
[1]  34 100