Showing posts with label HMMER. Show all posts
Showing posts with label HMMER. Show all posts

Thursday, January 22, 2015

Intention is to extend the ORA pipeline to scan for all chemosensory related genes.

Take 15 random V1Rs from the Young, et al (2010) paper.
-->Should you only take functional genes as training sequences?

At first I was aligning the sequences with ClustalOmega and trying to put the sequences in .msf format, as many HMMER tutorials show. However, I could not for the life of me figure out what was wrong and kept getting the following error:
hmmbuild vno.hmm vno_trainerSeqs.msf
Alignment input open failed.
   couldn't determine alignment input format
   while reading file vno_trainerSeqs.msf

So I gave up on .msf and tried PHYLIP--seemed to have worked fine

head vno_trainerSeqs.msf
 15  384
v1r-10    ----MLKLVIIENMAEIMLFSLDLLLFSTDIL----CFNFPSKMIKLPGF
v1r-06    ---------------------------------------MMNKNSRLYTD
v1r-04    ----------------------------------------------MAVD
v1r-12    -------------------------------------------------M
v1r-01    -------------------------------------MSAHGNSLKTTEE
v1r-14    ---------------------------------------------MLTYD
v1r-15    ---------------------------------------------MSSHK
v1r-03    ---------------------------------------------MDITE
v1r-02    -------------------------------------------MKMTSSN
v1r-08    ---------------------------------------------MSSAK
v1r-11    MVGDTLKLLSPL-MTRY----FFLLFYSTDSSDLNENQHPLDFDEMAFGK
v1r-05    ---------------------------------------------MDARN
v1r-13    ---------------------------------------------MASKD
v1r-07    -------MIHVD-RDSY----PL-----AGFSSSEDKYLSLTTDRKASRE
v1r-09    ---------------------------------------------MATGD

ITIQIFFYPQASFGISANTILLLFHIFTFVFSHRSKSIDMIISHLSLIHI
SNIRNTFFAEIGIGVSANSLLLLFNIFKLICGQRSRLTDLPIGLLSLINL
VAQGVSFLYQTGLGILGNSLLITLYLTSFLLGSKLKPTDLTIIHLALVHT
LSFKKAFYFQAGIGISANIFLLLWHIFTFFKDHKPKNHDLIICHLAFAHI
VALQILLLCQFGVGTVANVFLFVHNFSPVLTGSKQRPRQVILSHMAVANA
DFMCIFHKLQTIIGLFGNSFLLYLYILKLIINQRITLIDKICINLVFSNI
VGLEIVYLTLLLFGILGNMFLIYLQSLKFITDHRKRVINLIIINLALAHT
LSFGIAIVMQFSIGVSVNVFVFLFYAQIISTSYKASFSDLILAHLAFANT
LVVGILLFSQIVMGMLGNSSILFYYVILIFTGKHLTPKDLIIEHLTFANC
WETRIILVAQMGVGILGNTSLFCLCNFTLFTGQKVRCTDIILSQLALANS
VKSGISFLIQTGVGILGNSFLLCFYNLILFTGHKLRPTDLILSQLALANS
LGVGITYLLQSVVGLLGNISLIFYYLIIYYKKHKIKPMDLILMHLIIVNI
FAIGMIL-SQIMVGFLGNFFLLYHYSFLHFTRGMLQSTDLTLKHLTIANS
LVVGIILSLQTTFGILGNFSLLYHYLFLYFTGSRLRSTDLIVKNLIVANL
LVVGMVFLSQTILGILGNFSLLCHYLLLHFTGCRARCTDLILRHLTIANS

Alrighty. Will this now work with ORA?
It appears so! Actually is not too difficult.
[1] Put the alignment file of the training receptors and run "hmmbuild" to make the HMMER probability file.
hmmbuild vno.hmm vno_trainerSeqs.phy
# hmmbuild :: profile HMM construction from multiple sequence alignments
# HMMER 3.1b1 (May 2013); http://hmmer.org/
# Copyright (C) 2013 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# input alignment file:             vno_trainerSeqs.phy
# output HMM file:                  vno.hmm
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# idx name                  nseq  alen  mlen eff_nseq re/pos description
#---- -------------------- ----- ----- ----- -------- ------ -----------
1     vno_trainerSeqs         15   384   315     2.03  0.590 
[3] In the "scripts" folder of the ORA pipeline, append the vno.hmm text to the or.hmm text. Make sure there is a new line after the last "//" or it will not press properly.

[4] Delete "or.hmm.h3f", "or.hmm.h3i", "or.hmm.h3m", and "or.hmm.h3p". Then repress the new or.hmm:
hmmpress or.hmm
Looks like with V2R, the ORA is picking up things like glutamate receptors (GRM3), and GABA receptors. Wonder why....will need to think about and retrain. I'm assuming the introns are throwing it off.

Wednesday, December 3, 2014


Gave in and installed the new blast+ using homebrew.

The sequence can have no newlines and needs to have a FASTA specific header:
head /Volumes/Ruficollis/ORA_annotations/C_sowelli_ORs_nonewline.fasta
>gb|deg7180000002010|OR7
CTGCACTCACCTATG....
>gb|deg7180000002011|OR51_PSEUDOGENE

sudo makeblastdb -in /Volumes/Ruficollis/ORA_annotations/C_sowelli_ORs_nonewline.fasta -out C_sowelli_ORS_blastdb -dbtype nucl -parse_seqids


awk '{if (substr($0,1,1)==">"){print "\n"$0} else printf("%s",$0);p++;}END{print "\n"}' DR_091_Mobl_MOE_assembled.fasta > joined.fasta

renamed_DR_013_Mored_MOE fasta:
>gnl|some_ID|gene129

Still didn't work.
OKAY GRUMBLE!!!! BLAST AND ITS ARCHAIC WAY OF LIFE >_<

perl ~/ora-1.9.1/scripts/or.pl -sequence DR_004_Arjam_MOE_assembled.part-01.fasta -a -d 

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: die in _initialize, hmm profile not found at /Volumes/Ruficollis/ORA_annotations/to_analyze/or.hmm

It doesn't like not being in its home directory when running the perl script.
perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-01.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt1.fasta

Okay so this finally worked. Now time to get hacky. With all of my sequence data I get this annoying perl file open error:


Too many open files at /Library/Perl/5.16/Bio/ORA.pm line 524.

ulimit -a
The OS limits the number of files that perl can open at one time.
ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 709
virtual memory          (kbytes, -v) unlimited

We see that the max is 256. From what I've read on a Mac OSX the max is 1024--but PLEASE someone correct me if I'm wrong.

ulimit -n 1024

Just to be safe, I am going to split my sequence data into 100 fasta files each:
perl ~/Scripts/fasta-splitter.pl DR_004_Arjam_MOE_assembled.fasta -n-parts 100
perl ~/Scripts/fasta-splitter.pl DR_011_Arjam_VNO_assembled.fasta -n-parts 100
perl ~/Scripts/fasta-splitter.pl DR_013_Mored_MOE_assembled.fasta -n-parts 100
perl ~/Scripts/fasta-splitter.pl DR_013_Mored_VNO_assembled.fasta -n-parts 100
perl ~/Scripts/fasta-splitter.pl DR_091_Mobl_MOE_assembled.fasta -n-parts 100

Then run the ORA pipeline in a shell script for each file. AND THEN cat them all together. Yes, I know..very hacky.

#this must be run from within the ORA scripts folder so it can access all of the HMM folders at once

sh run_ORA_DR_004MOE.sh

#What does this shell script look like?
#!/bin/bash
perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-001.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt1.fasta
perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-002.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt2.fasta
perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-003.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt3.fasta
....
perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-100.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt100.fasta

cat /Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs* >> /Volumes/Ruficollis/ORA_annotations/DR_004_Arjam_MOE_ORs.fasta

Okay if that doesn't do the trick then I will be letting out another very loud grumble!

Wednesday, July 2, 2014

Plans for today:

[1] Find missing data for high school student
[2] Figure out how to teach RaxML to a high school student (JW slides) COMPLETE
[3] Get Gazey-Staley algorithm set up in R and running. COMPLETE
      -Okay, so we don't actually have to do this step, as its just a measure of cloning success (and since we didn't clone. We don't really need to measure this. What we do need to measure (but not right away, is the # of sequences (reads) v. # genes (unique contigs). This will tell us if we have sampled enough of the genome for the OR genes given our degenerate primers.

Anyways, to download the R code for the Gazey-Staley algorithm along with a heterogeneity test (with very good documentation), visit:
http://batlab.ucd.ie/~spuechmaille/

[4] Figure out most recent problem with babbler paper.
[5] Set up ORA pipeline COMPLETE
____________________________________________
To install ORA (after having install BioPerl, HMMER, and FASTA), download the .zip file from:
http://search.cpan.org/~ceratites/ora-1.9.1/lib/Bio/ORA.pm#DRIVER_SCRIPT

Unzip and navigate to the ORA folder.

#run the perl script to make the "makefile"

perl Makefile.PL
sudo make install
perl Build.PL

WOOOOHOOOO got it working.
There are two main issues (assuming you have everything installed).
1) You must run the perl script in the /ora-*/scripts folder because it must access the .hmm files
2) When you install ORA, it includes the older version of HMMER. The error message you get when you run "or.pl", is: 
...
binary auxfiles are in an outdated HMMER format (3/b); please hmmpress your HMM file again

To fix this, do the following (assuming you have HMMER installed).
-In the /ora-*/scripts folder, there should be several files related to HMMER
Delete the following: or.hmm.h3f     or.hmm.h3i     or.hmm.h3m     or.hmm.h3p
On the command line, type:
hmmpress or.hmm

That should fix things.

To run the ORA:
perl or.pl --sequence ~/Documents/Analyses/Carrollia_OR/GPC-trim.deg.fasta
#still playing with all of the fun parameters

Now I need to trim the data of a certain length.
By the end of the weekend, it would be great to make a tree from the sequences.

Thursday, June 12, 2014

Installing *HMMER v3

Download and unzip the version (from this program with a communist logo: http://hmmer.janelia.org/)
Navigate to the directory where you have moved your downloaded hmmer.. file and type the following commands into the terminal (in bold):

cd hmmer-3.1b1-macosx # move into new directory
./configure #configure
make #build
make check #run the automated tests

make install #install

Smoothest installation ever (assuming you have make)...