Laurel's Lab Notebook: HMMER

Showing posts with label HMMER. Show all posts

Thursday, January 22, 2015

Intention is to extend the ORA pipeline to scan for all chemosensory related genes.

Take 15 random V1Rs from the Young, et al (2010) paper.
-->Should you only take functional genes as training sequences?

At first I was aligning the sequences with ClustalOmega and trying to put the sequences in .msf format, as many HMMER tutorials show. However, I could not for the life of me figure out what was wrong and kept getting the following error:

hmmbuild vno.hmm vno_trainerSeqs.msf

Alignment input open failed.

   couldn't determine alignment input format

   while reading file vno_trainerSeqs.msf

So I gave up on .msf and tried PHYLIP--seemed to have worked fine

head vno_trainerSeqs.msf

 15  384

v1r-10    ----MLKLVIIENMAEIMLFSLDLLLFSTDIL----CFNFPSKMIKLPGF

v1r-06    ---------------------------------------MMNKNSRLYTD

v1r-04    ----------------------------------------------MAVD

v1r-12    -------------------------------------------------M

v1r-01    -------------------------------------MSAHGNSLKTTEE

v1r-14    ---------------------------------------------MLTYD

v1r-15    ---------------------------------------------MSSHK

v1r-03    ---------------------------------------------MDITE

v1r-02    -------------------------------------------MKMTSSN

v1r-08    ---------------------------------------------MSSAK

v1r-11    MVGDTLKLLSPL-MTRY----FFLLFYSTDSSDLNENQHPLDFDEMAFGK

v1r-05    ---------------------------------------------MDARN

v1r-13    ---------------------------------------------MASKD

v1r-07    -------MIHVD-RDSY----PL-----AGFSSSEDKYLSLTTDRKASRE

v1r-09    ---------------------------------------------MATGD

ITIQIFFYPQASFGISANTILLLFHIFTFVFSHRSKSIDMIISHLSLIHI

SNIRNTFFAEIGIGVSANSLLLLFNIFKLICGQRSRLTDLPIGLLSLINL

VAQGVSFLYQTGLGILGNSLLITLYLTSFLLGSKLKPTDLTIIHLALVHT

LSFKKAFYFQAGIGISANIFLLLWHIFTFFKDHKPKNHDLIICHLAFAHI

VALQILLLCQFGVGTVANVFLFVHNFSPVLTGSKQRPRQVILSHMAVANA

DFMCIFHKLQTIIGLFGNSFLLYLYILKLIINQRITLIDKICINLVFSNI

VGLEIVYLTLLLFGILGNMFLIYLQSLKFITDHRKRVINLIIINLALAHT

LSFGIAIVMQFSIGVSVNVFVFLFYAQIISTSYKASFSDLILAHLAFANT

LVVGILLFSQIVMGMLGNSSILFYYVILIFTGKHLTPKDLIIEHLTFANC

WETRIILVAQMGVGILGNTSLFCLCNFTLFTGQKVRCTDIILSQLALANS

VKSGISFLIQTGVGILGNSFLLCFYNLILFTGHKLRPTDLILSQLALANS

LGVGITYLLQSVVGLLGNISLIFYYLIIYYKKHKIKPMDLILMHLIIVNI

FAIGMIL-SQIMVGFLGNFFLLYHYSFLHFTRGMLQSTDLTLKHLTIANS

LVVGIILSLQTTFGILGNFSLLYHYLFLYFTGSRLRSTDLIVKNLIVANL

LVVGMVFLSQTILGILGNFSLLCHYLLLHFTGCRARCTDLILRHLTIANS

Alrighty. Will this now work with ORA?
It appears so! Actually is not too difficult.
[1] Put the alignment file of the training receptors and run "hmmbuild" to make the HMMER probability file.

hmmbuild vno.hmm vno_trainerSeqs.phy

# hmmbuild :: profile HMM construction from multiple sequence alignments

# HMMER 3.1b1 (May 2013); http://hmmer.org/

# Copyright (C) 2013 Howard Hughes Medical Institute.

# Freely distributed under the GNU General Public License (GPLv3).

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# input alignment file:             vno_trainerSeqs.phy

# output HMM file:                  vno.hmm

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# idx name                  nseq  alen  mlen eff_nseq re/pos description

#---- -------------------- ----- ----- ----- -------- ------ -----------

1     vno_trainerSeqs         15   384   315     2.03  0.590 

[3] In the "scripts" folder of the ORA pipeline, append the vno.hmm text to the or.hmm text. Make sure there is a new line after the last "//" or it will not press properly.

[4] Delete "or.hmm.h3f", "or.hmm.h3i", "or.hmm.h3m", and "or.hmm.h3p". Then repress the new or.hmm:
hmmpress or.hmm
Looks like with V2R, the ORA is picking up things like glutamate receptors (GRM3), and GABA receptors. Wonder why....will need to think about and retrain. I'm assuming the introns are throwing it off.

Wednesday, December 3, 2014

Gave in and installed the new blast+ using homebrew.

The sequence can have no newlines and needs to have a FASTA specific header:

head /Volumes/Ruficollis/ORA_annotations/C_sowelli_ORs_nonewline.fasta

>gb|deg7180000002010|OR7

CTGCACTCACCTATG....

>gb|deg7180000002011|OR51_PSEUDOGENE

sudo makeblastdb -in /Volumes/Ruficollis/ORA_annotations/C_sowelli_ORs_nonewline.fasta -out C_sowelli_ORS_blastdb -dbtype nucl -parse_seqids

awk '{if (substr($0,1,1)==">"){print "\n"$0} else printf("%s",$0);p++;}END{print "\n"}' DR_091_Mobl_MOE_assembled.fasta > joined.fasta

renamed_DR_013_Mored_MOE fasta:

>gnl|some_ID|gene129

Still didn't work.
OKAY GRUMBLE!!!! BLAST AND ITS ARCHAIC WAY OF LIFE >_<

perl ~/ora-1.9.1/scripts/or.pl -sequence DR_004_Arjam_MOE_assembled.part-01.fasta -a -d 

------------- EXCEPTION: Bio::Root::Exception -------------

MSG: die in _initialize, hmm profile not found at /Volumes/Ruficollis/ORA_annotations/to_analyze/or.hmm

It doesn't like not being in its home directory when running the perl script.

perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-01.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt1.fasta

Okay so this finally worked. Now time to get hacky. With all of my sequence data I get this annoying perl file open error:

Too many open files at /Library/Perl/5.16/Bio/ORA.pm line 524.

ulimit -a

The OS limits the number of files that perl can open at one time.

ulimit -a

core file size          (blocks, -c) 0

data seg size           (kbytes, -d) unlimited

file size               (blocks, -f) unlimited

max locked memory       (kbytes, -l) unlimited

max memory size         (kbytes, -m) unlimited

open files                      (-n) 256

pipe size            (512 bytes, -p) 1

stack size              (kbytes, -s) 8192

cpu time               (seconds, -t) unlimited

max user processes              (-u) 709

virtual memory          (kbytes, -v) unlimited

We see that the max is 256. From what I've read on a Mac OSX the max is 1024--but PLEASE someone correct me if I'm wrong.

ulimit -n 1024

Just to be safe, I am going to split my sequence data into 100 fasta files each:

perl ~/Scripts/fasta-splitter.pl DR_004_Arjam_MOE_assembled.fasta -n-parts 100

perl ~/Scripts/fasta-splitter.pl DR_011_Arjam_VNO_assembled.fasta -n-parts 100

perl ~/Scripts/fasta-splitter.pl DR_013_Mored_MOE_assembled.fasta -n-parts 100

perl ~/Scripts/fasta-splitter.pl DR_013_Mored_VNO_assembled.fasta -n-parts 100

perl ~/Scripts/fasta-splitter.pl DR_091_Mobl_MOE_assembled.fasta -n-parts 100

Then run the ORA pipeline in a shell script for each file. AND THEN cat them all together. Yes, I know..very hacky.

#this must be run from within the ORA scripts folder so it can access all of the HMM folders at once

sh run_ORA_DR_004MOE.sh

#What does this shell script look like?

#!/bin/bash

perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-001.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt1.fasta

perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-002.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt2.fasta

perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-003.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt3.fasta

....

perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-100.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt100.fasta

cat /Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs* >> /Volumes/Ruficollis/ORA_annotations/DR_004_Arjam_MOE_ORs.fasta

Okay if that doesn't do the trick then I will be letting out another very loud grumble!

Wednesday, July 2, 2014

Plans for today:

[1] Find missing data for high school student
[2] Figure out how to teach RaxML to a high school student (JW slides) COMPLETE
[3] Get Gazey-Staley algorithm set up in R and running. COMPLETE
-Okay, so we don't actually have to do this step, as its just a measure of cloning success (and since we didn't clone. We don't really need to measure this. What we do need to measure (but not right away, is the # of sequences (reads) v. # genes (unique contigs). This will tell us if we have sampled enough of the genome for the OR genes given our degenerate primers.

Anyways, to download the R code for the Gazey-Staley algorithm along with a heterogeneity test (with very good documentation), visit:
http://batlab.ucd.ie/~spuechmaille/

[4] Figure out most recent problem with babbler paper.
[5] Set up ORA pipeline COMPLETE
____________________________________________
To install ORA (after having install BioPerl, HMMER, and FASTA), download the .zip file from:
http://search.cpan.org/~ceratites/ora-1.9.1/lib/Bio/ORA.pm#DRIVER_SCRIPT

Unzip and navigate to the ORA folder.

#run the perl script to make the "makefile"

perl Makefile.PL

sudo make install

perl Build.PL

WOOOOHOOOO got it working.

There are two main issues (assuming you have everything installed).

1) You must run the perl script in the /ora-*/scripts folder because it must access the .hmm files

2) When you install ORA, it includes the older version of HMMER. The error message you get when you run "or.pl", is:

...

binary auxfiles are in an outdated HMMER format (3/b); please hmmpress your HMM file again

To fix this, do the following (assuming you have HMMER installed).

-In the /ora-*/scripts folder, there should be several files related to HMMER

Delete the following: or.hmm.h3f or.hmm.h3i or.hmm.h3m or.hmm.h3p

On the command line, type:

hmmpress or.hmm

That should fix things.

To run the ORA:

perl or.pl --sequence ~/Documents/Analyses/Carrollia_OR/GPC-trim.deg.fasta

#still playing with all of the fun parameters

Now I need to trim the data of a certain length.

By the end of the weekend, it would be great to make a tree from the sequences.

Thursday, June 12, 2014

Installing *HMMER v3

Download and unzip the version (from this program with a communist logo: http://hmmer.janelia.org/)

Navigate to the directory where you have moved your downloaded hmmer.. file and type the following commands into the terminal (in bold):

cd hmmer-3.1b1-macosx # move into new directory

./configure #configure

make #build

make check #run the automated tests

make install #install

Smoothest installation ever (assuming you have make)...