Gave in and installed the new blast+ using homebrew.
The sequence can have no newlines and needs to have a FASTA specific header:
head /Volumes/Ruficollis/ORA_annotations/C_sowelli_ORs_nonewline.fasta
>gb|deg7180000002010|OR7
CTGCACTCACCTATG....
>gb|deg7180000002011|OR51_PSEUDOGENE
sudo makeblastdb -in /Volumes/Ruficollis/ORA_annotations/C_sowelli_ORs_nonewline.fasta -out C_sowelli_ORS_blastdb -dbtype nucl -parse_seqids
awk '{if (substr($0,1,1)==">"){print "\n"$0} else printf("%s",$0);p++;}END{print "\n"}' DR_091_Mobl_MOE_assembled.fasta > joined.fasta
renamed_DR_013_Mored_MOE fasta:
>gnl|some_ID|gene129
Still didn't work.
OKAY GRUMBLE!!!! BLAST AND ITS ARCHAIC WAY OF LIFE >_<
ulimit -a
Still didn't work.
OKAY GRUMBLE!!!! BLAST AND ITS ARCHAIC WAY OF LIFE >_<
perl ~/ora-1.9.1/scripts/or.pl -sequence DR_004_Arjam_MOE_assembled.part-01.fasta -a -d
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: die in _initialize, hmm profile not found at /Volumes/Ruficollis/ORA_annotations/to_analyze/or.hmm
It doesn't like not being in its home directory when running the perl script.
perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-01.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt1.fasta
Okay so this finally worked. Now time to get hacky. With all of my sequence data I get this annoying perl file open error:
Too many open files at /Library/Perl/5.16/Bio/ORA.pm line 524.
ulimit -a
The OS limits the number of files that perl can open at one time.
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 256
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 709
virtual memory (kbytes, -v) unlimited
We see that the max is 256. From what I've read on a Mac OSX the max is 1024--but PLEASE someone correct me if I'm wrong.
ulimit -n 1024
Just to be safe, I am going to split my sequence data into 100 fasta files each:
perl ~/Scripts/fasta-splitter.pl DR_004_Arjam_MOE_assembled.fasta -n-parts 100
perl ~/Scripts/fasta-splitter.pl DR_011_Arjam_VNO_assembled.fasta -n-parts 100
perl ~/Scripts/fasta-splitter.pl DR_013_Mored_MOE_assembled.fasta -n-parts 100
perl ~/Scripts/fasta-splitter.pl DR_013_Mored_VNO_assembled.fasta -n-parts 100
perl ~/Scripts/fasta-splitter.pl DR_091_Mobl_MOE_assembled.fasta -n-parts 100
Then run the ORA pipeline in a shell script for each file. AND THEN cat them all together. Yes, I know..very hacky.
#this must be run from within the ORA scripts folder so it can access all of the HMM folders at once
sh run_ORA_DR_004MOE.sh
#What does this shell script look like?
#!/bin/bash
perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-001.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt1.fasta
perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-002.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt2.fasta
perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-003.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt3.fasta
....
perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-100.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt100.fasta
cat /Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs* >> /Volumes/Ruficollis/ORA_annotations/DR_004_Arjam_MOE_ORs.fasta
Okay if that doesn't do the trick then I will be letting out another very loud grumble!