Monday, June 30, 2014

Getting data assembled now. Not using Sprai--going to try to use Celera now (which I have already installed).

After several hours of downloading a million things, it turns out there a bug in svn code, but not if you download the .tar.gz file directly from sourceforge. For some reason, kmer will not compile (it must not be updated) if you try to svn into it.

Anyways, here is what I got to work:
Visit: http://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-8.1/
bzip2 -dc wgs-8.1.tar.bz2 | tar -xf - #don't compile Celera yet
cd kmer
gmake install
#oops don't have gmake; just make it the same as "make"
sudo ln -s /usr/bin/make /usr/bin/gmake
cd ..
cd samtools
make
cd ..
cd src
gmake

#i tried getting Figaro and UMDOverlapper to work but I don't want to mess
#things up; let's try this for now

In the README, it says you can run the assembler with:
  wgs-8.1/*/bin/runCA
#in my case, the * is Darwin-i386

The sequences are now kept in the spare Drive
cd Volumes/Spare/pacbio/C_sowelli

gunzip filtered_subreads.fast*
gunzip reads_of_insert.fast*

#make the FRG wrap file to be inputted
~/wgs-8.1/Darwin-i386/bin/fastqToCA -libraryname GPC -technology pacbio-raw -reads reads_of_insert.fastq >GPC_untrimmed.frg
#make the .spec file--for this first run:
#saved as GPC_spec.spec
merSize               = 17
merThreshold          = 0
merDistinct           = 0.9995
merTotal              = 0.995

doOBT                 = 0
doExtendClearRanges   = 0

unitigger             = bogart

ovlErrorRate          = 0.05  #  Compute overlaps up to 5% error
utgGraphErrorRate     = 0.05  #  Unitigs at 5% error
utgMergeErrorRate     = 0.05  #  Unitigs at 5% error
cnsErrorRate          = 0.05  #  Needed to allow ovlErrorRate=0.05
cgwErrorRate          = 0.05  #  Needed to allow ovlErrorRate=0.05

ovlConcurrency        = 16
cnsConcurrency        = 16

ovlThreads            = 1
ovlHashBits           = 22
ovlHashBlockLength    = 10000000
ovlRefBlockSize       = 25000

#cnsReduceUnitigs      = 0 0   #  Always use only uncontained reads for consensus
cnsReuseUnitigs       = 1     #  With no mates, no need to redo consensus

cnsMinFrags           = 1000
cnsPartitions         = 256

#run the assembler
~/wgs-8.1/Darwin-i386/bin/runCA -d Volumes/Spare/pacbio/C_sowelli/Assembly/GPC-trim -p GPC-trim -s /Volumes/Spare/pacbio/C_sowelli/Assembly/GPC_spec.spec GPC_untrimmed.frg

Trying with 5% error rates since the sequences are so similar.




Friday, June 27, 2014

We have the Pacbio reads back...first step is to assemble.

I am using Sprai because Sprai will output longer contigs--essential in distinguishing different olfactory receptors.

Sprai requires:
-Python > v 2.6 (which thank goodness we have on the server)
-NCBI BLAST v.2.2.27 (which we also have!)
-Celera Assembler (which we don't have)

Installing Celera Assembler:
Dowload the file (v. 8.1--could not get v 8.2 working) and navigate to home folder:
   bzip2 -dc wgs-8.1.tar.bz2 | tar -xf -
  cd wgs-8.1
  cd kmer && make install && cd ..
  cd samtools && make && cd ..
  cd src && make && cd ..
  cd ..

For Sprai to work, I changed the source code to accept longer reads.
cd wgs-8.1/src
vi AS_global.h
Change:
#define AS_READ_MAX_NORMAL_LEN_BITS 11
to:
#define AS_READ_MAX_NORMAL_LEN_BITS 15
Woops--I guess the new version is already set to 16!

Now install sprai:

tar -xzf sprai-0.9.5.1.3.tar.gz 
./waf configure
./waf buile

And of course I get an error.
Error #1: 
/Users/loloyohe/sprai-0.9.5.1.3/col2fqcell.h:78:7: error: use of undeclared identifier 'number_of_ballots'
      number_of_ballots += ballot[i];
      ^
....and this continues for anywhere "number_of_ballots" is stated.

Error #1 Solution:
"myrealigner.c" inherits the header "col2fqcell.h"
You can see in myrealigner.c there is no declaration of "number_of_objects"
In myrealigner.c, paste 
int number_of_ballots = 0;
under
int maximum_ballots = 11;

Error #2 & #3:
/Users/loloyohe/sprai-0.9.5.1.3/col2fqcell.h:25:47: error: function definition is not allowed here
  void set_vals(int col_index, int coded_base){
                                              ^
../myrealigner.c:583:102: error: function definition is not allowed here
    void print_fastq(char *chr, char *seq, char *depth, char *qual, char *base_exists, char *comment){
     
Error #2 & #3 solution: 
Facepalming the person who wrote this code. In C++, you cannot declare functions inside of functions GAHHHHHH

In "col2fqcell.h", near line 26, move:
  void set_vals(int col_index, int coded_base){
    ++ballot[coded_base];
    max_qvs[coded_base] = (max_qvs[coded_base] < (col[col_index].qv-'!')) ? (col[col_index].qv-'!') : max_qvs[coded_base];
  }
outside of the function 
void col2fqcell(){
...
  }

Okay now the amount of errors occurring is worrisome. There are so many undeclared variables and functions. This version of the code should not have been published. I am writing the the group that has published this and will follow up. 

Thursday, June 12, 2014

Installing FASTA on a Mac:

-ORA says it requires * FASTA r56 distribution. I did not see this on the server but installed the current version (which is 36). Maybe it's a typo in the readme?

Anyways, this one was a little dicier than HMMER:

Start by going to: ftp://ftp.ebi.ac.uk/pub/software/unix/fasta/
#cd to the /src file:
cd src
#make the file (you have to type in the path to the make folder in fasta)
#highlighted in pink is what you change to make it work on your comp

make -f ~/Scripts/fasta-36.3.6b/make/Makefile.os_x86 all

#now see if it works

 ~/Scripts/fasta-36.3.6b/bin/fasta36 -q ~/Scripts/fasta-36.3.6b/seq/mgstm1.aa ~/Scripts/fasta-36.3.6b/seq/prot_test.lseg
#depending on the version of fasta you downloaded, the command will be different (e.g. fasta35 
#instead of fasta36)

seqCat.pl -dcatlist.txt -if

#generates the phylip file for
seqConverter.pl -in -ope -dseqCat_sequences.nex
Installing *HMMER v3

Download and unzip the version (from this program with a communist logo: http://hmmer.janelia.org/)
Navigate to the directory where you have moved your downloaded hmmer.. file and type the following commands into the terminal (in bold):

cd hmmer-3.1b1-macosx # move into new directory
./configure #configure
make #build
make check #run the automated tests

make install #install

Smoothest installation ever (assuming you have make)...
Going to install the Olfactory Receptor family Assigner (http://search.cpan.org/~ceratites/ora-1.9.1/lib/Bio/ORA.pm#SEE_ALSO)

Downloaded the .gz.tar

Install Bioperl.

#need to find out version of perl
perl -v
# it is v.5.16.2

#woops of course you have to install fink
http://www.finkproject.org/download/index.php?phpLang=en

#of course there is no fink installation for macs OS > 10.8
# http://www.bioperl.org/wiki/Installing_BioPerl_on_Mac_OSX
#following general installation on unix
# http://www.bioperl.org/wiki/Installing_BioPerl_on_Unix

#first install CPAN
perl -MCPAN -e shell #answer a bunch of questions--try to select automate
#CPAN interface will install
cpan>install Bundle::CPAN #a bunch of installing lines happen
#takes several minutes and you have to answer questions in between

cpan>q #quits the CPAN interface

To get the most updated version of BioPerl (which I'm sure I'll regret down the line), install it via cpan (assuming you got cpan working as well).
>perl -MCPAN -e shell
cpan>d /bioperl/
Reading '/home/francisco/.cpan/Metadata'
  Database was generated on Wed, 19 Mar 2014 13:17:02 GMT
Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
Distribution    CJFIELDS/BioPerl-1.6.923.tar.gz
Distribution    CJFIELDS/BioPerl-DB-1.006900.tar.gz
Distribution    CJFIELDS/BioPerl-Network-1.006902.tar.gz
Distribution    CJFIELDS/BioPerl-Run-1.006900.tar.gz
cpan>install CJFIELDS/BioPerl-1.6.923.tar.gz 
#now a million things will happen and you should run the tests to make
#sure that it worked

Now you have to set up a local module. For some reason this cpan thing seems to work better and they are begging you to use it. What I have highlighted in pink is what you should change for your home directory.

>perl -e shell -MCPAN
cpan>o conf makepl_arg PREFIX=/Users/loloyohe/My_Local_Perl_Modules
cpan>o conf mbuildpl_arg "--prefix /Users/loloyohe/My_Local_Perl_Modules"
cpan>o conf commit

Should be done now!
Test to see if it worked:

Quit cpan and type in your terminal:
>perl -MBio::Seq -e 0

If there are no errors, you have installed it correctly :)
Downloading the files from CIPRES

View output for "babblers_8_newdate_#".

Download "bablers_8_5part_log" and "babblers_8_5part.trees" for all five runs.

All files are now being stored in "Timaliidae->Babblers_Adaptive Radiation->Final_BEAST
Only the first two runs converged well. Using the consensus tree from all five run yields and incorrect tree, especially species Dumetia. However, the branch length here virtually does not exist. Now I am going to make the consensus tree for the first two runs.

Dumetia is still wonky. Going to redo the partitions again...