Thursday, March 3, 2016

installing FastCodeML

Trying out a program called FastCodeML to try to get faster estimates of my PAML simulations. Joe Parker wrote a nice little summary on this. However, contrary to Joe's "nothing a little make make install can't handle", it actually...can't handle it.

Installation:
This is optimized for a Linux environment so if you have a Linux machine, you can just run the binary in the downloaded folder.

First visit: ftp://ftp.vital-it.ch/tools/FastCodeML/ and download FastCodeML-1.1.0.tar.gz. (Not completely obvious at first).

Navigate the directory and convince yourself that you can't run the binary.
105-238:FastCodeML-1.1.0 loloyohe$ file fast
fast: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.18, stripped
Nope, sorry..we have to build from source.

The installation guidelines are vague and basically useless.
Here are the install instructions:
Requirements to generate the executable:
* C++ compiler, e.g. GCC 4
* CMake 2.8.0 (including ccmake) or later recommended, although compilation possible without
* Boost::Spirit, see http://boost-spirit.com/home/
* Reasonably new BLAS implementation (e.g. OpenBLAS, Goto2, ACML, MKL); packages from various Linux distributions can be used, but this deteriorates performance; recommended: OpenBLAS (http://xianyi.github.io/OpenBLAS/) or Intel MKL
* Reasonably new LAPACK library (e.g. original LAPACK or ACML, MKL); packages from various Linux distributions can be used, but this deteriorates performance

How to generate the FastCodeML executable:
* Generate BLAS if necessary
* Generate LAPACK if necessary
* Generate NLopt library (http://ab-initio.mit.edu/wiki/index.php/NLopt)
* Edit CMakeLists.txt if necessary
* Set paths for libraries (change and execute SETPATHS)
* "ccmake ." and switch USE_MPI and USE_OPENMP on/off (other default settings should be ok)
* make will create an executable "fast"

Computer system:
* Linux preferred, but sources are portable to other platforms

And here are further "detailed instructions in the extra_install_doc folder for "Mac_Pro":
*) Create NLOpt in ~/lib
./configure --prefix=/home/mac/lib/nlopt
make
make install

*) Copy BLAS and LAPACK to ~/lib
cp libblas.a ~/lib 
cp liblapack.a ~/lib

*) Setting environment variables
export BLAS_LIB_DIR="/home/mac/lib" #we want to flexible here, hence we do not specify /usr/lib
export LAPACK_LIB_DIR="/home/mac/lib" #we want to flexible here, hence we do not specify /usr/lib
export NLOPT_LIB_DIR="/home/mac/lib/nlopt/lib"
export NLOPT_INCLUDE_DIR="/home/mac/lib/nlopt/include"
export MATH_LIB_NAMES="blas;lapack;lapack;blas;gfortranbegin;gfortran"
#export MPI_INCLUDE_PATH="/usr/include" #might not be necessary if CXX set correctly
#export MPI_LIBRARY="/usr/lib" #might not be necessary if CXX set correctly
export CXX="/usr/bin/mpicxx.mpich2" #remember to run as mpirun.mpich2 -np 2 ./fast

This is bringing back a past "lapack" nightmare I had several months ago. Looks like its going to be great fun.

We have a "homebrew" environment. While mostly good, it has confused installers about where our libraries are and which compiler to use. That, and along with Mavericks OSX being horribly configured make this extra challenging. I first try to brew install everything they ask for. Basically, if you have a homebrew setup, don't do anything the instructions tell you.

Really could only get boost installed
105-238:FastCodeML-1.1.0 loloyohe$ brew install boost
Lapack and BLAS fail miserably. Luckily, these seem to be optional so when we go to ./configure, we can turn these things off. Instead of a normal make/install setup, this program is set up for ccmake (note: NOT cmake). Fun! This means when you run it, it is looking for "CMakeLists.txt". This is the file you edit.

After much hair-pulling, here is what my CMakeLists.txt. (Everything else stayed the same).
# Get the configuration switches
OPTION(USE_LAPACK         "Use BLAS/LAPACK" OFF)
OPTION(USE_MKL_VML         "Use Intel MKL vectorized routines" OFF)
OPTION(USE_OPENMP         "Compile with OpenMP support" OFF)
OPTION(USE_MPI             "Use MPI for high level parallelization" OFF)
if(NOT WIN32)
OPTION(BUILD_NOT_SHARED   "Build FastCodeML not shared" OFF)
endif(NOT WIN32)
OPTION(BUILD_SEARCH_MPI   "Search for MPI installation?" OFF)
OPTION(USE_ORIGINAL_PROPORTIONS "Use the original CodeML proportion definition" OFF)
SET(USE_LIKELIHOOD_METHOD "Original" CACHE STRING "Select the type of likelihood computation method: Original, NonRecursive, FatVector, DAG")
SET_PROPERTY(CACHE USE_LIKELIHOOD_METHOD PROPERTY STRINGS Original NonRecursive FatVector DAG)
OPTION(USE_IDENTITY_MATRIX "Force identity matrix when time is zero" OFF)
OPTION(USE_CPV_SCALING "Scale conditional probability vectors to avoid under/overflow" OFF)

Now "ccmake" was a new experience for me. It wasn't working when I just typed "ccmake ." as instructed. I don't know why. But anyways, if i did:
105-238:FastCodeML-1.1.0 loloyohe$ ccmake /Applications/FastCodeML-1.1.0/
then a new interface shows up in the terminal, basically showing what I had switched on and off. It was not intuitive to me what was happening, but I just kept pressing "enter" "n" and "g" until finally I got out of the screen. Makefile, can I haz? I can haz!!!

105-238:FastCodeML-1.1.0 loloyohe$ make
Scanning dependencies of target fast
[  4%] Building CXX object CMakeFiles/fast.dir/fast.cpp.o
[  8%] Building CXX object CMakeFiles/fast.dir/CmdLine.cpp.o
[ 12%] Building CXX object CMakeFiles/fast.dir/Genes.cpp.o
[ 16%] Building CXX object CMakeFiles/fast.dir/Phylip.cpp.o
[ 20%] Building CXX object CMakeFiles/fast.dir/PhyloTree.cpp.o
[ 24%] Building CXX object CMakeFiles/fast.dir/Newick.cpp.o
[ 28%] Building CXX object CMakeFiles/fast.dir/TreeNode.cpp.o
[ 32%] Building CXX object CMakeFiles/fast.dir/BayesTest.cpp.o
[ 36%] Building CXX object CMakeFiles/fast.dir/FillMatrix.cpp.o
[ 40%] Building CXX object CMakeFiles/fast.dir/Forest.cpp.o
[ 44%] Building CXX object CMakeFiles/fast.dir/TransitionMatrix.cpp.o
[ 48%] Building CXX object CMakeFiles/fast.dir/BranchSiteModel.cpp.o
/Applications/FastCodeML-1.1.0/BranchSiteModel.cpp:381:27: warning: comparison of unsigned expression < 0 is always false
      [-Wtautological-compare]
        else if(aValidLen < 0)
                ~~~~~~~~~ ^ ~
1 warning generated.
[ 52%] Building CXX object CMakeFiles/fast.dir/ProbabilityMatrixSet.cpp.o
[ 56%] Building CXX object CMakeFiles/fast.dir/FatVectorTransform.cpp.o
[ 60%] Building CXX object CMakeFiles/fast.dir/CodonFrequencies.cpp.o
[ 64%] Building CXX object CMakeFiles/fast.dir/AlignedAllocator.cpp.o
/Applications/FastCodeML-1.1.0/AlignedAllocator.cpp:22:10: fatal error: 'malloc.h' file not found
#include <malloc.h>
         ^
1 error generated.
make[2]: *** [CMakeFiles/fast.dir/AlignedAllocator.cpp.o] Error 1
make[1]: *** [CMakeFiles/fast.dir/all] Error 2

make: *** [all] Error 2
We are getting closer. Now its time to get hacky. 
Basically, comment out the malloc.h in anything that uses it.

Open up "AlignedAllocator.cpp".
Change: 
#include <malloc.h>
To:
//#include <malloc.h>

Try again!
105-238:FastCodeML-1.1.0 loloyohe$ make
Scanning dependencies of target fast
[  4%] Building CXX object CMakeFiles/fast.dir/AlignedAllocator.cpp.o
[  8%] Building CXX object CMakeFiles/fast.dir/HighLevelCoordinator.cpp.o
[ 12%] Building CXX object CMakeFiles/fast.dir/CodeMLoptimizer.cpp.o
[ 16%] Building CXX object CMakeFiles/fast.dir/ForestExport.cpp.o
[ 20%] Building CXX object CMakeFiles/fast.dir/ParseParameters.cpp.o
[ 24%] Building CXX object CMakeFiles/fast.dir/VerbosityLevels.cpp.o
[ 28%] Building CXX object CMakeFiles/fast.dir/DAGScheduler.cpp.o
[ 32%] Building CXX object CMakeFiles/fast.dir/TreeAndSetsDependencies.cpp.o
[ 36%] Building CXX object CMakeFiles/fast.dir/WriteResults.cpp.o
[ 40%] Linking CXX executable fast
[100%] Built target fast

There, that's better.
105-238:FastCodeML-1.1.0 loloyohe$ file fast
fast: Mach-O 64-bit executable x86_64

More on execution later.