Showing posts with label olfactory. Show all posts
Showing posts with label olfactory. Show all posts

Thursday, July 7, 2016

installing mysql

I am hoping to run a set of scripts that will identify and index olfactory receptors using BLAST and mysql. That means I need to sort out how mysql is installed on noctilio, which should be great fun.

First thing I noticed when I downloaded mysql, is that we have 3 other version of mysql on the server! :)

I knew this was going to cause some problems so I needed to uninstall all versions of MySQL in order to get this working properly. I basically followed this awesome post. Here it is reiterated for Mac OsX Yosemite:

Remove MySQL completely per The Tech Lab

  • p-ax | grep mysql

  • stop and kill any MySQL processes
  • brew remove mysql
  • brew cleanup
  • sudo rm /usr/local/mysql
  • sudo rm -rf /usr/local/var/mysql
  • sudo rm -rf /usr/local/mysql*
  • sudo rm ~/Library/LaunchAgents/homebrew.mxcl.mysql.plist
  • sudo rm -rf /Library/StartupItems/MySQLCOM
  • sudo rm -rf /Library/PreferencePanes/My*
  • launchctl unload -~/Library/LaunchAgents/homebrew.mxcl.mysql.plist
  • edit /etc/hostconfig and remove the line MYSQLCOM=-YES-
  • rm -rf ~/Library/PreferencePanes/My*
  • sudo rm -rf /Library/Receipts/mysql*
  • sudo rm -rf /Library/Receipts/MySQL*
  • sudo rm -rf /private/var/db/receipts/*mysql*
  • restart your computer just to ensure any MySQL processes are killed
  • try to run mysql, it shouldn't work

Brew install MySQL per user Sedorner from this StackOverflow answer

  • brew doctor and fix any errors
  • brew update
  • brew install mysql
  • unset TMPDIR
  • mysqld -initialize --verbose --user=whoami --basedir="$(brew --prefix mysql)" --datadir=/usr/local/var/mysql --tmpdir=/tmp #note the above command has been edited to replace the deprecated form
  • mysql.server start
  • run the commands Brew suggests, add MySQL to launchctl so it automatically launches at startup
Okay now to have MySQL start at launch, follow this lovely post. Homebrew can basically set this up for you.
brew tap homebrew/services
Apparently it is no longer supported, so also run these commands:
brew untap homebrew/boneyard brew tap gapple/services

When you run this command, mysql should start:
brew services start mysql

Friday, October 16, 2015

Installing pbh5tools on OSX yosemite

I need to assemble PacBio amplicons of olfactory receptors. One of the main challenges I have been having is that most of the PacBio tools (SMRT-analysis, Celara assembler) focus on assembling whole genomes. Using an approach from Larson, et al. (2014) in BMC Genomics, I am going to do quality filtering using pbh5tools that will filter the reads at a minimum of passes (e.g. 4; though my mean number of passes is quite high for the data (~20)).

To install pbh5tools, I dealt with a series of challenges on Yosemite:
Download from github.
cd pbh5tools-master/
sudo python setup.py install

I ran into this pesky error.
In file included from /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/include/numpy/ndarraytypes.h:1760:
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: 
      "Using deprecated NumPy API, disable it by "          "#defining
      NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by " \
 ^
In file included from /tmp/easy_install-_W_eCV/h5py-2.5.0/h5py/defs.c:279:
/tmp/easy_install-_W_eCV/h5py-2.5.0/h5py/api_compat.h:27:10: fatal error: 
      'hdf5.h' file not found
#include "hdf5.h"
         ^
1 warning and 1 error generated.

error: Setup script exited with error: command 'cc' failed with exit status 1
First I tries to install hdf5 tool packages and then export the path there, but that wasn't working.

From the setup.py, I see that it needs:
    install_requires=[
        'pbcore >= 0.8.0',
        'numpy >= 1.6.0',
        'h5py >= 1.3.0'
        ]

sudo pip install pbcore
sudo pip install numpy
sudo pip install h5py

Then,
sudo python setup.py install
...
Using /Library/Python/2.7/site-packages
Searching for pysam==0.8.3
Best match: pysam 0.8.3
pysam 0.8.3 is already the active version in easy-install.pth

Using /Library/Python/2.7/site-packages
Finished processing dependencies for pbh5tools==0.8.0
Voila!

Wednesday, June 24, 2015

Running GARLI from our Model-O-Matic code.

If we remember from our model-o-matic output and sort by the lowest AIC, we can see the best model for OR6 is CodonM0.0.EQU+4dG. It is interesting how all codon models are preferred to the nucleotide models! This turned out to be the case in every set of alignments.




Things are looking good. I'll keep you posted on the results.

Here is the following best-fit models for each of the OR gene families (see summary files with AIC and likelihoods in /Volumes/Yango/Hayden_batORs/model_o_matic/excel_summaries):

OR137      CodonM0.0.EQU+4dG
OR213      CodonM0.0F64+4dG
OR4          CodonM0.0.F64+4dG
OR589      CodonM0.0.F64+4dG
OR6          CodonM0.0.EQU + 4dG
OR10        CodonM0.0.F64+4dG
OR11        CodonM0.0F64+4dG
OR51        CodonM0.0F64+4dG
OR52        CodonM0.0F64+4dG

Those highlighted in pink were run in model-O-matic as the -100normal parameter, as the program would crash if it was run as -normal. My thoughts are there are too many taxa, but other than than that I am at a loss.

Now we can apply this model to GARLI using our tree output. There are two options for running GARLI (at least that I use). One is on the CIPRES server, the other can be from command line. I am currently having CPU usage problems on CIPRES so I am going to run it from command.

Installing GARLI is easy if you have homebrew.
105-238:~ loloyohe$ brew install garli
Should be ready to go.

Let's take a second to interpret the output from

105-238:OR6 loloyohe$ garli garli.conf
Running GARLI Version 2.01.1067 (18 May 2012)
->Single processor version for 64-bit OS<-
##############################################################
 This is GARLI 2.0, the first "official" release including 
          partitioned models.  It is a merging of
   official release 1.0 and beta version GARLI-PART 0.97
  Briefly, it includes models for nucleotides, amino acids,
 codons, and morphology-like characters, any of which can be 
  mixed together and applied to different subsets of data.

    General program usage is extensively documented here:
            http://www.nescent.org/wg/garli/
      see this page for details on partitioned usage:
  http://www.nescent.org/wg/garli/Partition_testing_version
   and this page for details on Mkv mophology model usage:
    http://www.nescent.org/wg/garli/Mkv_morphology_model
         PLEASE LET ME KNOW OF ANY PROBLEMS AT:
                garli.support@gmail.com
##############################################################
Compiled Apr 27 2015 16:06:23 using GNU gcc compiler version 4.9.2
Using NCL version 2.1.17

#######################################################
Reading config file garli.conf
###################################################
READING OF DATA
Attempting to read data file in Nexus format (using NCL):
Phyllos_OR6_Funct_tAlign_noStops.nex ...
Reading DATA block...storing implied block: TAXA
storing read block: DATA
 successful

###################################################
PARTITIONING OF DATA AND MODELS
GARLI data subset 1
CHARACTERS block #1 ("Untitled DATA Block 1")
Data read as Nucleotide data,
modeled as Codon data
Gaps or ambiguity codes found within codon for taxon Lsi_29_OR6.
Codons coded as missing for that taxon: 6 120 235 236 
Gaps or ambiguity codes found within codon for taxon Phe_43_OR6.
Codons coded as missing for that taxon: 6 
Gaps or ambiguity codes found within codon for taxon Sti_17_OR6.
Codons coded as missing for that taxon: 185 
Summary of data:
  35 sequences.
  6 constant characters.
  232 parsimony-informative characters.
  5 uninformative variable characters.
  243 total characters.
  243 unique patterns in compressed data matrix.
Pattern processing required < 1 second


###################################################
NOTE: Unlike many programs, the amount of system memory that Garli will
use can be controlled by the user.
(This comes from the availablememory setting in the configuration file.
Availablememory should NOT be set to more than the actual amount of 
physical memory that your computer has installed)

For this dataset:
 Mem level availablememory setting
  great     >= 150 MB
  good approx 149 MB to 99 MB
  low approx 98 MB to 42 MB
  very low approx 42 MB to 31 MB
the minimum required availablememory is 31 MB

You specified that Garli should use at most 512.0 MB of memory.

Garli will actually use approx. 224.4 MB of memory
**Your memory level is: great (you don't need to change anything)**

#######################################################
Found outgroup specification:  1

#######################################################
STARTING RUN

>>>Search rep 1 (of 2)<<<
MODEL REPORT - Parameters are at their INITIAL values (not yet optimized)
Model 1
  Number of states = 61 (codon data, standard code)
  Nucleotide Relative Rate Matrix Assumed by Codon Model:     2 rates (transition and transversion) K param = 4.0000
  Equilibrium State Frequencies: equal (1/61 = 0.01639, fixed)
  Rate Heterogeneity Model:
    4 nonsynonymous rate categories, rate and proportion of each estimated
     (this is effectively the M3 model of PAML)
      dN/dS Proportion
      0.2500 0.2500
      0.5000 0.2500
      0.7500 0.2500
      1.0000 0.2500

Starting with seed=287687

creating likelihood stepwise addition starting tree...
number of taxa added:
 4  5  6  7  8  9  10  11  12  13  14  15  16  17

We can set the exact same thing up in CIPRES. 

Thursday, January 22, 2015

Intention is to extend the ORA pipeline to scan for all chemosensory related genes.

Take 15 random V1Rs from the Young, et al (2010) paper.
-->Should you only take functional genes as training sequences?

At first I was aligning the sequences with ClustalOmega and trying to put the sequences in .msf format, as many HMMER tutorials show. However, I could not for the life of me figure out what was wrong and kept getting the following error:
hmmbuild vno.hmm vno_trainerSeqs.msf
Alignment input open failed.
   couldn't determine alignment input format
   while reading file vno_trainerSeqs.msf

So I gave up on .msf and tried PHYLIP--seemed to have worked fine

head vno_trainerSeqs.msf
 15  384
v1r-10    ----MLKLVIIENMAEIMLFSLDLLLFSTDIL----CFNFPSKMIKLPGF
v1r-06    ---------------------------------------MMNKNSRLYTD
v1r-04    ----------------------------------------------MAVD
v1r-12    -------------------------------------------------M
v1r-01    -------------------------------------MSAHGNSLKTTEE
v1r-14    ---------------------------------------------MLTYD
v1r-15    ---------------------------------------------MSSHK
v1r-03    ---------------------------------------------MDITE
v1r-02    -------------------------------------------MKMTSSN
v1r-08    ---------------------------------------------MSSAK
v1r-11    MVGDTLKLLSPL-MTRY----FFLLFYSTDSSDLNENQHPLDFDEMAFGK
v1r-05    ---------------------------------------------MDARN
v1r-13    ---------------------------------------------MASKD
v1r-07    -------MIHVD-RDSY----PL-----AGFSSSEDKYLSLTTDRKASRE
v1r-09    ---------------------------------------------MATGD

ITIQIFFYPQASFGISANTILLLFHIFTFVFSHRSKSIDMIISHLSLIHI
SNIRNTFFAEIGIGVSANSLLLLFNIFKLICGQRSRLTDLPIGLLSLINL
VAQGVSFLYQTGLGILGNSLLITLYLTSFLLGSKLKPTDLTIIHLALVHT
LSFKKAFYFQAGIGISANIFLLLWHIFTFFKDHKPKNHDLIICHLAFAHI
VALQILLLCQFGVGTVANVFLFVHNFSPVLTGSKQRPRQVILSHMAVANA
DFMCIFHKLQTIIGLFGNSFLLYLYILKLIINQRITLIDKICINLVFSNI
VGLEIVYLTLLLFGILGNMFLIYLQSLKFITDHRKRVINLIIINLALAHT
LSFGIAIVMQFSIGVSVNVFVFLFYAQIISTSYKASFSDLILAHLAFANT
LVVGILLFSQIVMGMLGNSSILFYYVILIFTGKHLTPKDLIIEHLTFANC
WETRIILVAQMGVGILGNTSLFCLCNFTLFTGQKVRCTDIILSQLALANS
VKSGISFLIQTGVGILGNSFLLCFYNLILFTGHKLRPTDLILSQLALANS
LGVGITYLLQSVVGLLGNISLIFYYLIIYYKKHKIKPMDLILMHLIIVNI
FAIGMIL-SQIMVGFLGNFFLLYHYSFLHFTRGMLQSTDLTLKHLTIANS
LVVGIILSLQTTFGILGNFSLLYHYLFLYFTGSRLRSTDLIVKNLIVANL
LVVGMVFLSQTILGILGNFSLLCHYLLLHFTGCRARCTDLILRHLTIANS

Alrighty. Will this now work with ORA?
It appears so! Actually is not too difficult.
[1] Put the alignment file of the training receptors and run "hmmbuild" to make the HMMER probability file.
hmmbuild vno.hmm vno_trainerSeqs.phy
# hmmbuild :: profile HMM construction from multiple sequence alignments
# HMMER 3.1b1 (May 2013); http://hmmer.org/
# Copyright (C) 2013 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# input alignment file:             vno_trainerSeqs.phy
# output HMM file:                  vno.hmm
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# idx name                  nseq  alen  mlen eff_nseq re/pos description
#---- -------------------- ----- ----- ----- -------- ------ -----------
1     vno_trainerSeqs         15   384   315     2.03  0.590 
[3] In the "scripts" folder of the ORA pipeline, append the vno.hmm text to the or.hmm text. Make sure there is a new line after the last "//" or it will not press properly.

[4] Delete "or.hmm.h3f", "or.hmm.h3i", "or.hmm.h3m", and "or.hmm.h3p". Then repress the new or.hmm:
hmmpress or.hmm
Looks like with V2R, the ORA is picking up things like glutamate receptors (GRM3), and GABA receptors. Wonder why....will need to think about and retrain. I'm assuming the introns are throwing it off.