Laurel's Lab Notebook

Intention is to extend the ORA pipeline to scan for all chemosensory related genes.

Take 15 random V1Rs from the Young, et al (2010) paper.
-->Should you only take functional genes as training sequences?

At first I was aligning the sequences with ClustalOmega and trying to put the sequences in .msf format, as many HMMER tutorials show. However, I could not for the life of me figure out what was wrong and kept getting the following error:

hmmbuild vno.hmm vno_trainerSeqs.msf

Alignment input open failed.

   couldn't determine alignment input format

   while reading file vno_trainerSeqs.msf

So I gave up on .msf and tried PHYLIP--seemed to have worked fine

head vno_trainerSeqs.msf

 15  384

v1r-10    ----MLKLVIIENMAEIMLFSLDLLLFSTDIL----CFNFPSKMIKLPGF

v1r-06    ---------------------------------------MMNKNSRLYTD

v1r-04    ----------------------------------------------MAVD

v1r-12    -------------------------------------------------M

v1r-01    -------------------------------------MSAHGNSLKTTEE

v1r-14    ---------------------------------------------MLTYD

v1r-15    ---------------------------------------------MSSHK

v1r-03    ---------------------------------------------MDITE

v1r-02    -------------------------------------------MKMTSSN

v1r-08    ---------------------------------------------MSSAK

v1r-11    MVGDTLKLLSPL-MTRY----FFLLFYSTDSSDLNENQHPLDFDEMAFGK

v1r-05    ---------------------------------------------MDARN

v1r-13    ---------------------------------------------MASKD

v1r-07    -------MIHVD-RDSY----PL-----AGFSSSEDKYLSLTTDRKASRE

v1r-09    ---------------------------------------------MATGD

ITIQIFFYPQASFGISANTILLLFHIFTFVFSHRSKSIDMIISHLSLIHI

SNIRNTFFAEIGIGVSANSLLLLFNIFKLICGQRSRLTDLPIGLLSLINL

VAQGVSFLYQTGLGILGNSLLITLYLTSFLLGSKLKPTDLTIIHLALVHT

LSFKKAFYFQAGIGISANIFLLLWHIFTFFKDHKPKNHDLIICHLAFAHI

VALQILLLCQFGVGTVANVFLFVHNFSPVLTGSKQRPRQVILSHMAVANA

DFMCIFHKLQTIIGLFGNSFLLYLYILKLIINQRITLIDKICINLVFSNI

VGLEIVYLTLLLFGILGNMFLIYLQSLKFITDHRKRVINLIIINLALAHT

LSFGIAIVMQFSIGVSVNVFVFLFYAQIISTSYKASFSDLILAHLAFANT

LVVGILLFSQIVMGMLGNSSILFYYVILIFTGKHLTPKDLIIEHLTFANC

WETRIILVAQMGVGILGNTSLFCLCNFTLFTGQKVRCTDIILSQLALANS

VKSGISFLIQTGVGILGNSFLLCFYNLILFTGHKLRPTDLILSQLALANS

LGVGITYLLQSVVGLLGNISLIFYYLIIYYKKHKIKPMDLILMHLIIVNI

FAIGMIL-SQIMVGFLGNFFLLYHYSFLHFTRGMLQSTDLTLKHLTIANS

LVVGIILSLQTTFGILGNFSLLYHYLFLYFTGSRLRSTDLIVKNLIVANL

LVVGMVFLSQTILGILGNFSLLCHYLLLHFTGCRARCTDLILRHLTIANS

Alrighty. Will this now work with ORA?
It appears so! Actually is not too difficult.
[1] Put the alignment file of the training receptors and run "hmmbuild" to make the HMMER probability file.

hmmbuild vno.hmm vno_trainerSeqs.phy

# hmmbuild :: profile HMM construction from multiple sequence alignments

# HMMER 3.1b1 (May 2013); http://hmmer.org/

# Copyright (C) 2013 Howard Hughes Medical Institute.

# Freely distributed under the GNU General Public License (GPLv3).

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# input alignment file:             vno_trainerSeqs.phy

# output HMM file:                  vno.hmm

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# idx name                  nseq  alen  mlen eff_nseq re/pos description

#---- -------------------- ----- ----- ----- -------- ------ -----------

1     vno_trainerSeqs         15   384   315     2.03  0.590 

[3] In the "scripts" folder of the ORA pipeline, append the vno.hmm text to the or.hmm text. Make sure there is a new line after the last "//" or it will not press properly.

[4] Delete "or.hmm.h3f", "or.hmm.h3i", "or.hmm.h3m", and "or.hmm.h3p". Then repress the new or.hmm:
hmmpress or.hmm
Looks like with V2R, the ORA is picking up things like glutamate receptors (GRM3), and GABA receptors. Wonder why....will need to think about and retrain. I'm assuming the introns are throwing it off.

Laurel's Lab Notebook

Thursday, January 22, 2015

No comments:

Post a Comment