Thursday, June 12, 2014

Installing FASTA on a Mac:

-ORA says it requires * FASTA r56 distribution. I did not see this on the server but installed the current version (which is 36). Maybe it's a typo in the readme?

Anyways, this one was a little dicier than HMMER:

Start by going to: ftp://ftp.ebi.ac.uk/pub/software/unix/fasta/
#cd to the /src file:
cd src
#make the file (you have to type in the path to the make folder in fasta)
#highlighted in pink is what you change to make it work on your comp

make -f ~/Scripts/fasta-36.3.6b/make/Makefile.os_x86 all

#now see if it works

 ~/Scripts/fasta-36.3.6b/bin/fasta36 -q ~/Scripts/fasta-36.3.6b/seq/mgstm1.aa ~/Scripts/fasta-36.3.6b/seq/prot_test.lseg
#depending on the version of fasta you downloaded, the command will be different (e.g. fasta35 
#instead of fasta36)

seqCat.pl -dcatlist.txt -if

#generates the phylip file for
seqConverter.pl -in -ope -dseqCat_sequences.nex
Installing *HMMER v3

Download and unzip the version (from this program with a communist logo: http://hmmer.janelia.org/)
Navigate to the directory where you have moved your downloaded hmmer.. file and type the following commands into the terminal (in bold):

cd hmmer-3.1b1-macosx # move into new directory
./configure #configure
make #build
make check #run the automated tests

make install #install

Smoothest installation ever (assuming you have make)...
Going to install the Olfactory Receptor family Assigner (http://search.cpan.org/~ceratites/ora-1.9.1/lib/Bio/ORA.pm#SEE_ALSO)

Downloaded the .gz.tar

Install Bioperl.

#need to find out version of perl
perl -v
# it is v.5.16.2

#woops of course you have to install fink
http://www.finkproject.org/download/index.php?phpLang=en

#of course there is no fink installation for macs OS > 10.8
# http://www.bioperl.org/wiki/Installing_BioPerl_on_Mac_OSX
#following general installation on unix
# http://www.bioperl.org/wiki/Installing_BioPerl_on_Unix

#first install CPAN
perl -MCPAN -e shell #answer a bunch of questions--try to select automate
#CPAN interface will install
cpan>install Bundle::CPAN #a bunch of installing lines happen
#takes several minutes and you have to answer questions in between

cpan>q #quits the CPAN interface

To get the most updated version of BioPerl (which I'm sure I'll regret down the line), install it via cpan (assuming you got cpan working as well).
>perl -MCPAN -e shell
cpan>d /bioperl/
Reading '/home/francisco/.cpan/Metadata'
  Database was generated on Wed, 19 Mar 2014 13:17:02 GMT
Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
Distribution    CJFIELDS/BioPerl-1.6.923.tar.gz
Distribution    CJFIELDS/BioPerl-DB-1.006900.tar.gz
Distribution    CJFIELDS/BioPerl-Network-1.006902.tar.gz
Distribution    CJFIELDS/BioPerl-Run-1.006900.tar.gz
cpan>install CJFIELDS/BioPerl-1.6.923.tar.gz 
#now a million things will happen and you should run the tests to make
#sure that it worked

Now you have to set up a local module. For some reason this cpan thing seems to work better and they are begging you to use it. What I have highlighted in pink is what you should change for your home directory.

>perl -e shell -MCPAN
cpan>o conf makepl_arg PREFIX=/Users/loloyohe/My_Local_Perl_Modules
cpan>o conf mbuildpl_arg "--prefix /Users/loloyohe/My_Local_Perl_Modules"
cpan>o conf commit

Should be done now!
Test to see if it worked:

Quit cpan and type in your terminal:
>perl -MBio::Seq -e 0

If there are no errors, you have installed it correctly :)
Downloading the files from CIPRES

View output for "babblers_8_newdate_#".

Download "bablers_8_5part_log" and "babblers_8_5part.trees" for all five runs.

All files are now being stored in "Timaliidae->Babblers_Adaptive Radiation->Final_BEAST
Only the first two runs converged well. Using the consensus tree from all five run yields and incorrect tree, especially species Dumetia. However, the branch length here virtually does not exist. Now I am going to make the consensus tree for the first two runs.

Dumetia is still wonky. Going to redo the partitions again...

Thursday, May 29, 2014

Working in Dr. Stephen Rossiter's lab to learn to assemble my transcriptome data.

[1] Collect all relevant sequences to help identify reads as genes
>in Ensemble:
-in BioMart, select database "Microbat"
-export the gene IDs for olfactory receptors (set this in "Filters")--this will make an excel sheet of IDs that you will use to sort the attributes
-under "Attributes" -> Sequences-> Check "Ensembl GeneID", "Associated Gene Name", "Ensembl transcript ID", and "coding sequences"
-under "Features"-> Check "Ensembl Gene ID", "Ensembl Transcript ID"

>in Genbank: 
-try to find link to accession numbers in paper that link to a way to export sequences in FASTA...if not search accession "#:#[pacc]"

[2] Run quality control

There are three main steps in pre-processing quality control:
--remove reads with adapters
--remove reads with unknown nucleotides larger than 5%
--remove reads with low quality (more than 20% of the bases' qualities are less than 10 in a read)

We sequences the olfactory bulb transcriptome using paired-end long-read HiSeq through the GE SeqWright pipeline and their quality control only preprocesses the data up through the first step of removing the adapter sequences. Because we still need to filter out bad reads, this involves some creativity.

I first tried to filter out bad reads based on their quality score, but because it is paired end, there were an uneven amount of reads removed from each paired end set (~6,000 from one; ~13,000 from another) and it would be difficult to repair the paired ends, once the order is offset.

I then tried to overlap all of the reads using Flash with the intention of then filtering bad reads from the ones that have overlap. Flash worked really well but only 59% of my pairs overlapped enough, leaving me with only 16 million reads (as opposed to 27 million). Back to square one.

I then installed a program called "Popoolation". No comment.
I followed these instructions for installation. I had to tweak the 

--------------------
This has been sitting ing "Drafts" since January. I'm going to publish it anyways.
I am revisiting my babbler project.

Looking at the Tracer files again, I see that the "babblers_8_newdate" task provided the best convergence. I installed SumTrees to find the consensus tree.

#install Setuptools
wget https://bootstrap.pypa.io/ez_setup.py -O - | sudo python
#enter password

#install SumTrees

sudo easy_install -U dendropy

Okay realized I have already made the consensus tree back in October!

See directory /Users/loloyohe/Documents/Timaliidae/Babblers Adaptive Radiation/trees/output_trees