Thursday, July 7, 2016

installing mysql

I am hoping to run a set of scripts that will identify and index olfactory receptors using BLAST and mysql. That means I need to sort out how mysql is installed on noctilio, which should be great fun.

First thing I noticed when I downloaded mysql, is that we have 3 other version of mysql on the server! :)

I knew this was going to cause some problems so I needed to uninstall all versions of MySQL in order to get this working properly. I basically followed this awesome post. Here it is reiterated for Mac OsX Yosemite:

Remove MySQL completely per The Tech Lab

  • p-ax | grep mysql

  • stop and kill any MySQL processes
  • brew remove mysql
  • brew cleanup
  • sudo rm /usr/local/mysql
  • sudo rm -rf /usr/local/var/mysql
  • sudo rm -rf /usr/local/mysql*
  • sudo rm ~/Library/LaunchAgents/homebrew.mxcl.mysql.plist
  • sudo rm -rf /Library/StartupItems/MySQLCOM
  • sudo rm -rf /Library/PreferencePanes/My*
  • launchctl unload -~/Library/LaunchAgents/homebrew.mxcl.mysql.plist
  • edit /etc/hostconfig and remove the line MYSQLCOM=-YES-
  • rm -rf ~/Library/PreferencePanes/My*
  • sudo rm -rf /Library/Receipts/mysql*
  • sudo rm -rf /Library/Receipts/MySQL*
  • sudo rm -rf /private/var/db/receipts/*mysql*
  • restart your computer just to ensure any MySQL processes are killed
  • try to run mysql, it shouldn't work

Brew install MySQL per user Sedorner from this StackOverflow answer

  • brew doctor and fix any errors
  • brew update
  • brew install mysql
  • unset TMPDIR
  • mysqld -initialize --verbose --user=whoami --basedir="$(brew --prefix mysql)" --datadir=/usr/local/var/mysql --tmpdir=/tmp #note the above command has been edited to replace the deprecated form
  • mysql.server start
  • run the commands Brew suggests, add MySQL to launchctl so it automatically launches at startup
Okay now to have MySQL start at launch, follow this lovely post. Homebrew can basically set this up for you.
brew tap homebrew/services
Apparently it is no longer supported, so also run these commands:
brew untap homebrew/boneyard brew tap gapple/services

When you run this command, mysql should start:
brew services start mysql

Thursday, May 26, 2016

How to blast against a transcriptome

A guest post by Ramatu Abubakar.
  • Log into the server and make your own folder.
  • Import the fasta file you want to blast against the transcriptome. For example, "Mc1r_Fasta.txt"
Creating your own database::
  • Under your folder, make a single folder for all your transcriptome fasta files. 
  • Make a new folder and move one transcriptome file under that folder. For example my folder was called "PE111_Cabr_VNO_trinity_output" and my transcriptome file was "PE111_Cabr_VNO_trinity_output.Trinity.fasta"

Command for creating database:
  • cd into the folder containing the transcriptome file you want to make the database for.
Type in the following:

makeblastdb  -in (transcriptome file name) -title (name of the folder contain the transcriptome)           -dbtype (prot for a database of proteins and nucl for a database of DNA or RNA) -out (name of your output).

What it means:
  • Makeblastdb tells the blast program to create a database.
  • in represents the input file.
  • title represents the title for the blast database to be created.
  • dbtype tells the blast program whether it is a protein or nucleotide sequence.
  • out represents the name of each database created. You can call it anything you want.
Example below:

105-238:PE111_Cabr_VNO_trinity_output grads$ makeblastdb -in PE111_Cabr_VNO_trinity_output.Trinity.fasta -title PE111_Cabr_VNO_trinity_output -dbtype nucl -out PE111_Cabr_VNO_trinity_output.Trinity.aa


Building a new DB, current time: 05/26/2016 13:18:13
New DB name:   /Users/grads/Ramatu_new_transcriptome/PE111_Cabr_VNO_trinity_output/PE111_Cabr_VNO_trinity_output.Trinity.aa
New DB title:  PE111_Cabr_VNO_trinity_output
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 90410 sequences in 5.0665 seconds.

You should get results in your folder (make sure to refresh).

"nhr is the header file, nin is the index file and nsq is the sequence file. You dont really have to know this. Blast 'just' needs this"

Blasting against the transcriptome database:
  • cd back into your main folder
Type in the following:
blastn (use blast for nucleotide sequence and blastp for protein sequence) -query (fast file you want to search in the transcriptome) -db (database name created) -out (anything you want your output file to be called)

Building a new DB, current time: 05/26/2016 13:18:13
New DB name:   /Users/grads/Ramatu_new_transcriptome/PE111_Cabr_VNO_trinity_output/PE111_Cabr_VNO_trinity_output.Trinity.aa
New DB title:  PE111_Cabr_VNO_trinity_output
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 90410 sequences in 5.0665 seconds.
105-238:PE111_Cabr_VNO_trinity_output grads$ cd
105-238:~ grads$ cd Ramatu_new_transcriptome
105-238:Ramatu_new_transcriptome grads$ ls
Mc1r_Fasta.txt PE111_Cabr_VNO_trinity_output new_transcriptomes
105-238:Ramatu_new_transcriptome grads$ blastn -query Mc1r_Fasta.txt -db /Users/grads/Ramatu_new_transcriptome/PE111_Cabr_VNO_trinity_output/PE111_Cabr_VNO_trinity_output.Trinity.aa -out Mc1r_blastn_1.txt
105-238:Ramatu_new_transcriptome grads$ 

If done correctly, you should get a new result file. (Make sure to refresh)



Thursday, March 3, 2016

installing FastCodeML

Trying out a program called FastCodeML to try to get faster estimates of my PAML simulations. Joe Parker wrote a nice little summary on this. However, contrary to Joe's "nothing a little make make install can't handle", it actually...can't handle it.

Installation:
This is optimized for a Linux environment so if you have a Linux machine, you can just run the binary in the downloaded folder.

First visit: ftp://ftp.vital-it.ch/tools/FastCodeML/ and download FastCodeML-1.1.0.tar.gz. (Not completely obvious at first).

Navigate the directory and convince yourself that you can't run the binary.
105-238:FastCodeML-1.1.0 loloyohe$ file fast
fast: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.18, stripped
Nope, sorry..we have to build from source.

The installation guidelines are vague and basically useless.
Here are the install instructions:
Requirements to generate the executable:
* C++ compiler, e.g. GCC 4
* CMake 2.8.0 (including ccmake) or later recommended, although compilation possible without
* Boost::Spirit, see http://boost-spirit.com/home/
* Reasonably new BLAS implementation (e.g. OpenBLAS, Goto2, ACML, MKL); packages from various Linux distributions can be used, but this deteriorates performance; recommended: OpenBLAS (http://xianyi.github.io/OpenBLAS/) or Intel MKL
* Reasonably new LAPACK library (e.g. original LAPACK or ACML, MKL); packages from various Linux distributions can be used, but this deteriorates performance

How to generate the FastCodeML executable:
* Generate BLAS if necessary
* Generate LAPACK if necessary
* Generate NLopt library (http://ab-initio.mit.edu/wiki/index.php/NLopt)
* Edit CMakeLists.txt if necessary
* Set paths for libraries (change and execute SETPATHS)
* "ccmake ." and switch USE_MPI and USE_OPENMP on/off (other default settings should be ok)
* make will create an executable "fast"

Computer system:
* Linux preferred, but sources are portable to other platforms

And here are further "detailed instructions in the extra_install_doc folder for "Mac_Pro":
*) Create NLOpt in ~/lib
./configure --prefix=/home/mac/lib/nlopt
make
make install

*) Copy BLAS and LAPACK to ~/lib
cp libblas.a ~/lib 
cp liblapack.a ~/lib

*) Setting environment variables
export BLAS_LIB_DIR="/home/mac/lib" #we want to flexible here, hence we do not specify /usr/lib
export LAPACK_LIB_DIR="/home/mac/lib" #we want to flexible here, hence we do not specify /usr/lib
export NLOPT_LIB_DIR="/home/mac/lib/nlopt/lib"
export NLOPT_INCLUDE_DIR="/home/mac/lib/nlopt/include"
export MATH_LIB_NAMES="blas;lapack;lapack;blas;gfortranbegin;gfortran"
#export MPI_INCLUDE_PATH="/usr/include" #might not be necessary if CXX set correctly
#export MPI_LIBRARY="/usr/lib" #might not be necessary if CXX set correctly
export CXX="/usr/bin/mpicxx.mpich2" #remember to run as mpirun.mpich2 -np 2 ./fast

This is bringing back a past "lapack" nightmare I had several months ago. Looks like its going to be great fun.

We have a "homebrew" environment. While mostly good, it has confused installers about where our libraries are and which compiler to use. That, and along with Mavericks OSX being horribly configured make this extra challenging. I first try to brew install everything they ask for. Basically, if you have a homebrew setup, don't do anything the instructions tell you.

Really could only get boost installed
105-238:FastCodeML-1.1.0 loloyohe$ brew install boost
Lapack and BLAS fail miserably. Luckily, these seem to be optional so when we go to ./configure, we can turn these things off. Instead of a normal make/install setup, this program is set up for ccmake (note: NOT cmake). Fun! This means when you run it, it is looking for "CMakeLists.txt". This is the file you edit.

After much hair-pulling, here is what my CMakeLists.txt. (Everything else stayed the same).
# Get the configuration switches
OPTION(USE_LAPACK         "Use BLAS/LAPACK" OFF)
OPTION(USE_MKL_VML         "Use Intel MKL vectorized routines" OFF)
OPTION(USE_OPENMP         "Compile with OpenMP support" OFF)
OPTION(USE_MPI             "Use MPI for high level parallelization" OFF)
if(NOT WIN32)
OPTION(BUILD_NOT_SHARED   "Build FastCodeML not shared" OFF)
endif(NOT WIN32)
OPTION(BUILD_SEARCH_MPI   "Search for MPI installation?" OFF)
OPTION(USE_ORIGINAL_PROPORTIONS "Use the original CodeML proportion definition" OFF)
SET(USE_LIKELIHOOD_METHOD "Original" CACHE STRING "Select the type of likelihood computation method: Original, NonRecursive, FatVector, DAG")
SET_PROPERTY(CACHE USE_LIKELIHOOD_METHOD PROPERTY STRINGS Original NonRecursive FatVector DAG)
OPTION(USE_IDENTITY_MATRIX "Force identity matrix when time is zero" OFF)
OPTION(USE_CPV_SCALING "Scale conditional probability vectors to avoid under/overflow" OFF)

Now "ccmake" was a new experience for me. It wasn't working when I just typed "ccmake ." as instructed. I don't know why. But anyways, if i did:
105-238:FastCodeML-1.1.0 loloyohe$ ccmake /Applications/FastCodeML-1.1.0/
then a new interface shows up in the terminal, basically showing what I had switched on and off. It was not intuitive to me what was happening, but I just kept pressing "enter" "n" and "g" until finally I got out of the screen. Makefile, can I haz? I can haz!!!

105-238:FastCodeML-1.1.0 loloyohe$ make
Scanning dependencies of target fast
[  4%] Building CXX object CMakeFiles/fast.dir/fast.cpp.o
[  8%] Building CXX object CMakeFiles/fast.dir/CmdLine.cpp.o
[ 12%] Building CXX object CMakeFiles/fast.dir/Genes.cpp.o
[ 16%] Building CXX object CMakeFiles/fast.dir/Phylip.cpp.o
[ 20%] Building CXX object CMakeFiles/fast.dir/PhyloTree.cpp.o
[ 24%] Building CXX object CMakeFiles/fast.dir/Newick.cpp.o
[ 28%] Building CXX object CMakeFiles/fast.dir/TreeNode.cpp.o
[ 32%] Building CXX object CMakeFiles/fast.dir/BayesTest.cpp.o
[ 36%] Building CXX object CMakeFiles/fast.dir/FillMatrix.cpp.o
[ 40%] Building CXX object CMakeFiles/fast.dir/Forest.cpp.o
[ 44%] Building CXX object CMakeFiles/fast.dir/TransitionMatrix.cpp.o
[ 48%] Building CXX object CMakeFiles/fast.dir/BranchSiteModel.cpp.o
/Applications/FastCodeML-1.1.0/BranchSiteModel.cpp:381:27: warning: comparison of unsigned expression < 0 is always false
      [-Wtautological-compare]
        else if(aValidLen < 0)
                ~~~~~~~~~ ^ ~
1 warning generated.
[ 52%] Building CXX object CMakeFiles/fast.dir/ProbabilityMatrixSet.cpp.o
[ 56%] Building CXX object CMakeFiles/fast.dir/FatVectorTransform.cpp.o
[ 60%] Building CXX object CMakeFiles/fast.dir/CodonFrequencies.cpp.o
[ 64%] Building CXX object CMakeFiles/fast.dir/AlignedAllocator.cpp.o
/Applications/FastCodeML-1.1.0/AlignedAllocator.cpp:22:10: fatal error: 'malloc.h' file not found
#include <malloc.h>
         ^
1 error generated.
make[2]: *** [CMakeFiles/fast.dir/AlignedAllocator.cpp.o] Error 1
make[1]: *** [CMakeFiles/fast.dir/all] Error 2

make: *** [all] Error 2
We are getting closer. Now its time to get hacky. 
Basically, comment out the malloc.h in anything that uses it.

Open up "AlignedAllocator.cpp".
Change: 
#include <malloc.h>
To:
//#include <malloc.h>

Try again!
105-238:FastCodeML-1.1.0 loloyohe$ make
Scanning dependencies of target fast
[  4%] Building CXX object CMakeFiles/fast.dir/AlignedAllocator.cpp.o
[  8%] Building CXX object CMakeFiles/fast.dir/HighLevelCoordinator.cpp.o
[ 12%] Building CXX object CMakeFiles/fast.dir/CodeMLoptimizer.cpp.o
[ 16%] Building CXX object CMakeFiles/fast.dir/ForestExport.cpp.o
[ 20%] Building CXX object CMakeFiles/fast.dir/ParseParameters.cpp.o
[ 24%] Building CXX object CMakeFiles/fast.dir/VerbosityLevels.cpp.o
[ 28%] Building CXX object CMakeFiles/fast.dir/DAGScheduler.cpp.o
[ 32%] Building CXX object CMakeFiles/fast.dir/TreeAndSetsDependencies.cpp.o
[ 36%] Building CXX object CMakeFiles/fast.dir/WriteResults.cpp.o
[ 40%] Linking CXX executable fast
[100%] Built target fast

There, that's better.
105-238:FastCodeML-1.1.0 loloyohe$ file fast
fast: Mach-O 64-bit executable x86_64

More on execution later.

Wednesday, December 2, 2015

Running RELAX.


We are looking for relaxed selection in Trpc2 of several lineages of bats that have reduced morphology of vomeronasal system organs or brain regions.

To run RELAX:
RELAX is relatively recent and you have to download the most recent batch files for HyPhy.
Start HyPhy.
Open->Open Batch File...
Navigate to Applications/hyphy/res/TemplateBatchFiles/
Select RELAX.bf
It will automatically prompt you to select your data/alignment file.
Then select your tree file.
Make sure the correct number of Ts, Rs, and Us match.
It should hopefully begin, assuming you have your tree set up correctly and your alignment does not have any stop codons AND it is in frame.

Something VERY critical if you are running it in the GUI...you must select the test branches directly when it asks for it, not just when you put them in the tree. Otherwise it estimates the reference branches as the test branches and vice versa.

RELAX1:
Alignment: bat_data_match.nex
Tree: bat_tree_match_RELAX1.tre
(((((((((((((((((((((Artibeus_jamaicensis{R}:2.431373248,Artibeus_obscurus{R}:2.431373248){R}:0.3588412223,Artibeus_fimbriatus{R}:2.79021447){R}:0.9561230126,(Artibeus_intermedius{R}:0.6313580609,Artibeus_lituratus{R}:0.6313580609){R}:3.114979422){R}:1.424889238,Artibeus_inopinatus{R}:5.171226721){R}:0.9218212476,Artibeus_concolor{R}:6.093047969){R}:2.392412914,((((Artibeus_watsoni{R}:2.760434077,Artibeus_incomitatus{R}:2.760434077){R}:1.940812275,Artibeus_phaeotis{R}:4.701246352){R}:0.9218549563,Artibeus_cinereus{R}:5.623101308){R}:0.8637586984,(Artibeus_glaucus{R}:3.316302056,Artibeus_gnomus{R}:3.316302056){R}:3.170557951){R}:1.998600876){R}:2.413693209,(((Ariteus_flavescens{R}:2.487096883,Ardops_nichollsi{R}:2.487096883){R}:2.828400699,Phyllops_falcatus{R}:5.315497583){R}:0.8401618707,((Pygoderma_bilabiatum{R}:3.602361855,Sphaeronycteris_toxophyllum{R}:3.602361855){R}:1.181679598,Centurio_senex{R}:4.784041453){R}:1.371618001){R}:4.743494639){R}:1.13714402,Ectophylla_alba{R}:12.03629811){R}:0.9420299332,Enchisthenes_hartii{R}:12.97832805){R}:1.825921019,((((((((Platyrrhinus_nigellus{R}:2.504001415,Platyrrhinus_dorsalis{R}:2.504001415){R}:0.4677903585,(Platyrrhinus_infuscus{R}:2.110403375,Platyrrhinus_aurarius{R}:2.110403375){R}:0.8613883988){R}:1.095202405,Platyrrhinus_albericoi{R}:4.066994179){R}:2.177118584,Platyrrhinus_lineatus{R}:6.244112763){R}:1.779671702,Vampyrodes_caraccioli{R}:8.023784464){R}:2.695962001,(Vampyressa_thyone{R}:7.747059999,Mesophylla_macconnelli{R}:7.747059999){R}:2.972686466){R}:0.777302653,(((Chiroderma_villosum{R}:1.968379874,Chiroderma_improvisum{R}:1.968379874){R}:1.432916649,Chiroderma_doriae{R}:3.401296523){R}:5.790148924,(Vampyressa_brocki{R}:7.444560681,Vampyressa_bidens{R}:7.444560681){R}:1.746884766){R}:2.305603671){R}:0.737544275,Uroderma_magnirostrum{R}:12.23459339){R}:2.569655671){R}:2.978950491,(((Sturnira_bogotensis{R}:4.351270017,Sturnira_tildae{R}:4.351270017){R}:0.6450817233,Sturnira_magna{R}:4.99635174){R}:0.4749192885,(Sturnira_oporaphilum{R}:0.9163878452,Sturnira_lilium{R}:0.9163878452){R}:4.554883183){R}:12.31192853){R}:1.896666128,Rhinophylla_fischerae{R}:19.67986568){R}:1.622496147,(((((Carollia_perspicillata{R}:2.166884003,Carollia_brevicauda{R}:2.166884003){R}:0.7022808564,Carollia_sowelli{R}:2.869164859){R}:1.049593577,Carollia_subrufa{R}:3.918758436){R}:4.394363478,Carollia_castanea{R}:8.313121914){R}:11.76400773,(Glyphonycteris_sylvestris{R}:15.43079994,Trinycteris_nicefori{R}:15.43079994){R}:4.6463297){R}:1.225232188){R}:1.57140971,((((Lonchophylla_hesperia{R}:9.371624678,Lionycteris_spurrelli{R}:9.371624678){R}:2.706833404,Platalina_genovensium{R}:12.07845808){R}:1.204240126,Lonchophylla_chocoana{R}:13.28269821){R}:2.390290187,Lonchophylla_mordax{R}:15.6729884){R}:7.200783145){R}:1.155625037,((((((Glossophaga_longirostris{R}:5.673316434,Glossophaga_commissarisi{R}:5.673316434){R}:1.473493579,Glossophaga_soricina{R}:7.146810013){R}:4.41746123,(Leptonycteris_nivalis{R}:1.825719598,Leptonycteris_curasoae{R}:1.825719598){R}:9.738551645){R}:6.408254338,(Brachyphylla_pumila{T}:1.377027909,Brachyphylla_cavernarum{T}:1.377027909){T}:16.59549767){R}:1.44044124,(((Musonycteris_harrisoni{T}:1.250127196,Choeronycteris_mexicana{T}:1.250127196){T}:4.862545039,Choeroniscus_godmani{T}:6.112672234){T}:10.9120772,((Anoura_latidens{R}:3.779427833,Anoura_geoffroyi{R}:3.779427833){R}:2.83493221,Anoura_caudifer{R}:6.614360043){R}:10.41038939){R}:2.388217391){R}:3.771706573,(Lonchorhina_aurita{R}:8.795556441,Lonchorhina_inusitata{R}:8.795556441){R}:14.38911695){R}:0.8447231842){R}:0.9614243908,(((((Lophostoma_evotis{R}:2.279691562,Lophostoma_silvicolaFG{R}:2.279691562){R}:7.690705428,Lophostoma_brasiliense{R}:9.97039699){R}:7.599298001,(Phyllostomus_elongatus{R}:8.119582824,Phyllostomus_discolor{R}:8.119582824){R}:9.450112167){R}:1.521869862,(Tonatia_saurophila{R}:4.502755885,Tonatia_bidens{R}:4.502755885){R}:14.58880897){R}:3.674523748,(Chrotopterus_auritus{R}:19.15333438,Mimon_cozumelae{R}:19.15333438){R}:3.61275422){R}:2.224732367){R}:2.433669076,((Diaemus_youngi{R}:12.39244932,Desmodus_rotundus{R}:12.39244932){R}:7.835663972,Diphylla_ecaudata{R}:20.22811329){R}:7.196376756){R}:0.8214752215,(((Micronycteris_microtis{R}:9.018674346,Micronycteris_hirsuta{R}:9.018674346){R}:5.324423535,(Micronycteris_minuta{R}:4.882043938,Micronycteris_schmidtorum{R}:4.882043938){R}:9.461053943){R}:8.513447254,Lampronycteris_brachyotis{R}:22.85654514){R}:5.389420131){R}:7.944937657,((((Pteronotus_quadridens{T}:6.670886247,Pteronotus_macleayii{T}:6.670886247){T}:4.04633203,Pteronotus_davyi{T}:10.71721828){T}:3.138003269,Pteronotus_parnellii{T}:13.85522155){T}:18.91999925,Mormoops_megalophylla{T}:32.7752208){T}:3.415682128){R}:5.869690062,Thyroptera_lavali{T}:42.06059299){R}:13.44742223,((((((((Miniopterus_aelleni{R}:1.299615,Miniopterus_petersoni{R}:1.299615){R}:4.270025,Miniopterus_gleni{R}:5.569641){R}:2.003838,Miniopterus_sororculus{R}:7.573479){R}:2.031566,Miniopterus_minor{R}:9.605045){R}:14.889341,(Miniopterus_fuliginosus{R}:17.236618,Miniopterus_schreibersii{R}:17.236618){R}:7.257769){R}:27.640976,Myotis_elegans{T}:52.13536){R}:1.662856,Molossus_sinaloae{T}:53.798217){R}:0.976298,Chilonatalus_micropus{T}:54.774513){R}:0.873379);


Load this file into HyPhy-vision
Results from the first run (Important results are in the .json file):

Test for selection relaxation (K = 0.43) was significant (p = 0.0000, LR = 46.62)

  • Modellog L# par.AICcTime to fitLtreeBranch setω1ω2ω3
    Partitioned MG94xREV-4116.852188675.524 min. 44 sec.1.33Reference0.102 (100%)
    Test0.367 (100%)
    General Descriptive-4024.514228915.0415 min. 55 sec.3.99All0.156 (100%)0.601 (0.0%)10.7 (0.0%)
    Null-4127.442218702.867 min. 9 sec.4.05Reference0.139 (99%)0.141 (0.48%)53.5 (0.020%)
    Test0.139 (99%)0.141 (0.48%)53.5 (0.020%)
    Alternative-4104.132228658.3034 min. 33 sec.3.99Reference0.0563 (5.6%)0.0763 (94%)3.83 (0.57%)
    Test0.290 (5.6%)0.331 (94%)1.78 (0.57%)
    Partitioned Exploratory-4104.052268666.3612 min. 26 sec.3.99Reference0.0231 (5.9%)0.0743 (93%)3.16 (0.84%)
    Test0.290 (0.0%)0.334 (100%)1.42 (0.0%)









Model (click to enlarge): Alternative (A), General Descriptive (B)


ω distributions under the Alternative model

Test branches are shown in blue and reference branches are shown in red

ω distributions under the Null model

Test branches are shown in blue and reference branches are shown in red

ω distributions under the Partitioned Exploratory model

Test branches are shown in blue and reference branches are shown in red

ω distributions under the Partitioned MG94xREV model

Test branches are shown in blue and reference branches are shown in red


The x-axis of the graphs comes out horribly, but other than that, I'm pleased with the visualization package.

Monday, November 23, 2015

Find the best model of evolution for Trpc2 using modelomatic:
#For this analysis, all stop codons need to be removed.
#For troubleshooting modelOmatic, I wrote a more detailed post here.
#Navigate to the correct directory and run this command:
modelomatic_OSX Trpc2_alignment_noStops.phy bionj Trpc2_modelomatic_normal.txt 0 normal

Interestingly REV+4dG is the best model, despite it being a protein coding gene. This is essentially "GTR+G". The 4dG is showing the four rate categories (this is an arbitrary number) for the gamma parameter.

Find the "best" (ML) tree in GALI using search replicates:
#run this command
garli Trpc2_garli.conf
#Note how to implement GTR+G in GARLI. See here for more examples of other models.
#Here is my configuration file (Trpc2_garli.conf):
[general]
datafname = Trpc2_garli.nex
constraintfile = none
streefname = stepwise
attachmentspertaxon = 50
ofprefix = Trpc2.GTR4DG
randseed = -1
availablememory = 512
logevery = 10
saveevery = 100
refinestart = 1
outputeachbettertopology = 0
outputcurrentbesttopology = 0
enforcetermconditions = 1
genthreshfortopoterm = 20000
scorethreshforterm = 0.05
significanttopochange = 0.01
outputphyliptree = 0
outputmostlyuselessfiles = 0
writecheckpoints = 0
restart = 0
outgroup = 1
resampleproportion = 1.0
inferinternalstateprobs = 0
outputsitelikelihoods = 0
optimizeinputonly = 0
collapsebranches = 1

searchreps = 8
bootstrapreps = 0

[model1]
datatype = nucleotide
ratematrix = 6rate
statefrequencies = estimate
ratehetmodel = gamma
numratecats = 4
invariantsites = none

[master]
nindivs = 4
holdover = 1
selectionintensity = 0.5
holdoverpenalty = 0
stopgen = 5000000
stoptime = 5000000

startoptprec = 0.5
minoptprec = 0.01
numberofprecreductions = 10
treerejectionthreshold = 50.0
topoweight = 1.0
modweight = 0.05
brlenweight = 0.2
randnniweight = 0.1
randsprweight = 0.3
limsprweight =  0.6
intervallength = 100
intervalstostore = 5
limsprrange = 6
meanbrlenmuts = 5
gammashapebrlen = 1000
gammashapemodel = 1000
uniqueswapbias = 0.1
distanceswapbias = 1.0

Putting bootstrap values on GARLI:
You can bootstrap in GARLI, but to "put" the bootstraps on the nodes, you have to summarize the bootstrap results using an outside method. SumTrees.py in the Dendropy python package works well. I basically followed the exact instructions somebody wrote here.

Running bootstrap iterations in GARLI:
#I ran four independent runs simultaneously of 250 bootstrap iterations that will combine to = 1000.
#Here is my GARLI configuration file:
[general]
datafname = Trpc2_garli.nex
constraintfile = none
streefname = stepwise
attachmentspertaxon = 50
ofprefix = Trpc2.GTR4DG.boot
randseed = -1
availablememory = 512
logevery = 10
saveevery = 100
refinestart = 1
outputeachbettertopology = 0
outputcurrentbesttopology = 0
enforcetermconditions = 1
genthreshfortopoterm = 20000
scorethreshforterm = 0.05
significanttopochange = 0.01
outputphyliptree = 0
outputmostlyuselessfiles = 0
writecheckpoints = 0
restart = 0
outgroup = 1
resampleproportion = 1.0
inferinternalstateprobs = 0
outputsitelikelihoods = 0
optimizeinputonly = 0
collapsebranches = 1

searchreps = 1
bootstrapreps = 250

[model1]
datatype = nucleotide
ratematrix = 6rate
statefrequencies = estimate
ratehetmodel = gamma
numratecats = 4
invariantsites = none

[master]
nindivs = 4
holdover = 1
selectionintensity = 0.5
holdoverpenalty = 0
stopgen = 5000000
stoptime = 5000000

startoptprec = 0.5
minoptprec = 0.01
numberofprecreductions = 10
treerejectionthreshold = 20.0
topoweight = 1.0
modweight = 0.05
brlenweight = 0.2
randnniweight = 0.1
randsprweight = 0.3
limsprweight =  0.6
intervallength = 100
intervalstostore = 5
limsprrange = 6
meanbrlenmuts = 5
gammashapebrlen = 1000
gammashapemodel = 1000
uniqueswapbias = 0.1
distanceswapbias = 1.0

Installing dendropy
#We have "pip" set up on our server, making the python package easy to install:
sudo pip install -U dendropy

Running SumTrees
#place all of your bootstrap outputs and your best tree from GARLI in its own folder

cd /Volumes/Yango/Trpc2/garli/

mkdir sumtrees
cp Trpc2.GTR4DG.best.tre sumtrees/
cp Trpc2.GTR4DG.boot.boot.tre sumtrees/
cp Trpc2.GTR4DG.boot2.boot.tre sumtrees/
cp Trpc2.GTR4DG.boot3.boot.tre sumtrees/
cp Trpc2.GTR4DG.boot4.boot.tre sumtrees/

#from within the folder, run sumtrees.py
#your target tree is the "best" tree from your GARLI output

sumtrees.py --decimals=0 --percentages --no-analysis-metainformation --no-taxa-block Trpc2.GTR4DG.boot.boot.tre Trpc2.GTR4DG.boot2.boot.tre Trpc2.GTR4DG.boot3.boot.tre Trpc2.GTR4DG.boot4.boot.tre --target=Trpc2.GTR4DG.best.tre --output=supportOnBest.phy

Viewing tree with bootstraps
Now, open supportOnBest.phy in FigTree. Click on "node labels" and select "support" as the node label. These are your bootstrap values.