Thursday, January 22, 2015

Intention is to extend the ORA pipeline to scan for all chemosensory related genes.

Take 15 random V1Rs from the Young, et al (2010) paper.
-->Should you only take functional genes as training sequences?

At first I was aligning the sequences with ClustalOmega and trying to put the sequences in .msf format, as many HMMER tutorials show. However, I could not for the life of me figure out what was wrong and kept getting the following error:
hmmbuild vno.hmm vno_trainerSeqs.msf
Alignment input open failed.
   couldn't determine alignment input format
   while reading file vno_trainerSeqs.msf

So I gave up on .msf and tried PHYLIP--seemed to have worked fine

head vno_trainerSeqs.msf
 15  384
v1r-10    ----MLKLVIIENMAEIMLFSLDLLLFSTDIL----CFNFPSKMIKLPGF
v1r-06    ---------------------------------------MMNKNSRLYTD
v1r-04    ----------------------------------------------MAVD
v1r-12    -------------------------------------------------M
v1r-01    -------------------------------------MSAHGNSLKTTEE
v1r-14    ---------------------------------------------MLTYD
v1r-15    ---------------------------------------------MSSHK
v1r-03    ---------------------------------------------MDITE
v1r-02    -------------------------------------------MKMTSSN
v1r-08    ---------------------------------------------MSSAK
v1r-11    MVGDTLKLLSPL-MTRY----FFLLFYSTDSSDLNENQHPLDFDEMAFGK
v1r-05    ---------------------------------------------MDARN
v1r-13    ---------------------------------------------MASKD
v1r-07    -------MIHVD-RDSY----PL-----AGFSSSEDKYLSLTTDRKASRE
v1r-09    ---------------------------------------------MATGD

ITIQIFFYPQASFGISANTILLLFHIFTFVFSHRSKSIDMIISHLSLIHI
SNIRNTFFAEIGIGVSANSLLLLFNIFKLICGQRSRLTDLPIGLLSLINL
VAQGVSFLYQTGLGILGNSLLITLYLTSFLLGSKLKPTDLTIIHLALVHT
LSFKKAFYFQAGIGISANIFLLLWHIFTFFKDHKPKNHDLIICHLAFAHI
VALQILLLCQFGVGTVANVFLFVHNFSPVLTGSKQRPRQVILSHMAVANA
DFMCIFHKLQTIIGLFGNSFLLYLYILKLIINQRITLIDKICINLVFSNI
VGLEIVYLTLLLFGILGNMFLIYLQSLKFITDHRKRVINLIIINLALAHT
LSFGIAIVMQFSIGVSVNVFVFLFYAQIISTSYKASFSDLILAHLAFANT
LVVGILLFSQIVMGMLGNSSILFYYVILIFTGKHLTPKDLIIEHLTFANC
WETRIILVAQMGVGILGNTSLFCLCNFTLFTGQKVRCTDIILSQLALANS
VKSGISFLIQTGVGILGNSFLLCFYNLILFTGHKLRPTDLILSQLALANS
LGVGITYLLQSVVGLLGNISLIFYYLIIYYKKHKIKPMDLILMHLIIVNI
FAIGMIL-SQIMVGFLGNFFLLYHYSFLHFTRGMLQSTDLTLKHLTIANS
LVVGIILSLQTTFGILGNFSLLYHYLFLYFTGSRLRSTDLIVKNLIVANL
LVVGMVFLSQTILGILGNFSLLCHYLLLHFTGCRARCTDLILRHLTIANS

Alrighty. Will this now work with ORA?
It appears so! Actually is not too difficult.
[1] Put the alignment file of the training receptors and run "hmmbuild" to make the HMMER probability file.
hmmbuild vno.hmm vno_trainerSeqs.phy
# hmmbuild :: profile HMM construction from multiple sequence alignments
# HMMER 3.1b1 (May 2013); http://hmmer.org/
# Copyright (C) 2013 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# input alignment file:             vno_trainerSeqs.phy
# output HMM file:                  vno.hmm
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# idx name                  nseq  alen  mlen eff_nseq re/pos description
#---- -------------------- ----- ----- ----- -------- ------ -----------
1     vno_trainerSeqs         15   384   315     2.03  0.590 
[3] In the "scripts" folder of the ORA pipeline, append the vno.hmm text to the or.hmm text. Make sure there is a new line after the last "//" or it will not press properly.

[4] Delete "or.hmm.h3f", "or.hmm.h3i", "or.hmm.h3m", and "or.hmm.h3p". Then repress the new or.hmm:
hmmpress or.hmm
Looks like with V2R, the ORA is picking up things like glutamate receptors (GRM3), and GABA receptors. Wonder why....will need to think about and retrain. I'm assuming the introns are throwing it off.

No comments:

Post a Comment