Wednesday, December 3, 2014


Gave in and installed the new blast+ using homebrew.

The sequence can have no newlines and needs to have a FASTA specific header:
head /Volumes/Ruficollis/ORA_annotations/C_sowelli_ORs_nonewline.fasta
>gb|deg7180000002010|OR7
CTGCACTCACCTATG....
>gb|deg7180000002011|OR51_PSEUDOGENE

sudo makeblastdb -in /Volumes/Ruficollis/ORA_annotations/C_sowelli_ORs_nonewline.fasta -out C_sowelli_ORS_blastdb -dbtype nucl -parse_seqids


awk '{if (substr($0,1,1)==">"){print "\n"$0} else printf("%s",$0);p++;}END{print "\n"}' DR_091_Mobl_MOE_assembled.fasta > joined.fasta

renamed_DR_013_Mored_MOE fasta:
>gnl|some_ID|gene129

Still didn't work.
OKAY GRUMBLE!!!! BLAST AND ITS ARCHAIC WAY OF LIFE >_<

perl ~/ora-1.9.1/scripts/or.pl -sequence DR_004_Arjam_MOE_assembled.part-01.fasta -a -d 

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: die in _initialize, hmm profile not found at /Volumes/Ruficollis/ORA_annotations/to_analyze/or.hmm

It doesn't like not being in its home directory when running the perl script.
perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-01.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt1.fasta

Okay so this finally worked. Now time to get hacky. With all of my sequence data I get this annoying perl file open error:


Too many open files at /Library/Perl/5.16/Bio/ORA.pm line 524.

ulimit -a
The OS limits the number of files that perl can open at one time.
ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 709
virtual memory          (kbytes, -v) unlimited

We see that the max is 256. From what I've read on a Mac OSX the max is 1024--but PLEASE someone correct me if I'm wrong.

ulimit -n 1024

Just to be safe, I am going to split my sequence data into 100 fasta files each:
perl ~/Scripts/fasta-splitter.pl DR_004_Arjam_MOE_assembled.fasta -n-parts 100
perl ~/Scripts/fasta-splitter.pl DR_011_Arjam_VNO_assembled.fasta -n-parts 100
perl ~/Scripts/fasta-splitter.pl DR_013_Mored_MOE_assembled.fasta -n-parts 100
perl ~/Scripts/fasta-splitter.pl DR_013_Mored_VNO_assembled.fasta -n-parts 100
perl ~/Scripts/fasta-splitter.pl DR_091_Mobl_MOE_assembled.fasta -n-parts 100

Then run the ORA pipeline in a shell script for each file. AND THEN cat them all together. Yes, I know..very hacky.

#this must be run from within the ORA scripts folder so it can access all of the HMM folders at once

sh run_ORA_DR_004MOE.sh

#What does this shell script look like?
#!/bin/bash
perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-001.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt1.fasta
perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-002.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt2.fasta
perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-003.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt3.fasta
....
perl or.pl -sequence /Volumes/Ruficollis/ORA_annotations/to_analyze/DR_004_Arjam_MOE_assembled.part-100.fasta -a -d >/Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs_pt100.fasta

cat /Volumes/Ruficollis/ORA_annotations/to_analyze/to_cat/DR_004_ORs* >> /Volumes/Ruficollis/ORA_annotations/DR_004_Arjam_MOE_ORs.fasta

Okay if that doesn't do the trick then I will be letting out another very loud grumble!

Wednesday, November 19, 2014

Problem troubleshooted while installing Phylobayes on Mavericks by LM Davalos:

Problem: Parallel phylobayes (pb_mpi1.5a) does not compile from source on multi-processor mac running OS 10.9. After clearing out any errors having to do with open-mpi, these errors persist:

file BP2util.h
shell: fatal error: 'tr1/unordered_map' file not found
#include <tr1/unordered_map>

original #include <tr1/unordered_map> #tr1 is deprecated in 10.9 and will not compile
modified: #include <unordered_map>

shell: error: use of undeclared identifier 'tr1'
original: class BipartitionHashTable : public tr1::unordered_map<string,T> {};
modified: class BipartitionHashTable : public unordered_map<string,T> {};

file BP2Stat.h
shell: fatal error: 'tr1/unordered_map' file not found
#include <tr1/unordered_map>

original: #include <tr1/unordered_map>
modified: #include <unordered_map>
#define hash_map std::unordered_map

file PolyNode.cpp
shell: error: variable length array of non-POD element type 'string'
original:  string retval[degree]; #line 431
modified:  string *retval = new string[degree]; #line 431

Wednesday, September 24, 2014

perl ~/Scripts/blast_parser.pl <DR_004_Arjam_MOE_assembled_v_OR_VR_indices.tblastx >DR_004_Arjam_MOE_assembled_v_OR_VR_indices.tblastx.parsed2
OK, processed 8019 queries, 23298 hits

perl ~/Scripts/blast_parser.pl <DR_011_Arjam_VNO_assembled_v_OR_VR_indices.tblastx  >DR_011_Arjam_VNO_assembled_v_OR_VR_indices.tblastx.parsed2
OK, processed 8019 queries, 77307 hits

perl ~/Scripts/blast_parser.pl <DR_013_Mored_MOE_assembled_v_OR_VR_indices.tblastx  >DR_013_Arjam_MOE_assembled_v_OR_VR_indices.tblastx.parsed2
OK, processed 8019 queries, 251312 hits

perl ~/Scripts/blast_parser.pl <DR_013_Mored_VNO_assembled_v_OR_VR_indices.tblast  >DR_013_Arjam_VNO_assembled_v_OR_VR_indices.tblastx.parsed2
OK, processed 8019 queries, 232694 hits

perl ~/Scripts/blast_parser.pl <DR_091_Mobl_MOE_assembled_v_OR_VR_indices.tblastx  >DR_091_Mobl_MOE_assembled_v_OR_VR_indices.tblastx.parsed2
OK, processed 8019 queries, 96314 hits

Tuesday, September 16, 2014

#remove newlines
awk '{if (substr($0,1,1)==">"){print "\n"$0} else printf("%s",$0);p++;}END{print "\n"}' DR_091_Mobl_MOE_assembled.fasta > joined.fasta

If I try running BLASTALL with my transcriptome data as the database, I get this error:

[blastall 2.2.22] ERROR: SeqPortNew: lcl

All the names of the sequences are too similar so I have to rename them--I am just going to rename them by number.
awk '/^>/{print ">" ++i; next}{print}' < DR_091_Mobl_MOE_assembled.fasta > renamed_DR_091_Mobl_MOE_assembled.fasta

#export the path the blast
export PATH=$PATH:/Users/laurelyohe/blast-2.2.22/bin/

#format the database 
#because the transcriptome is bigger than the indices, it will serve as the database
formatdb -i renamed_DR_091_Mobl_MOE_assembled.fasta -o T -p F
#run blast
blastall -p tblastx -d renamed_DR_013_Mored_MOE_assembled.fasta -i OR_VR_indices.fasta -e 1e-06 -v 5 -b 5 -a 2 -o ./DR_013_Mored_MOE_assembled_v_OR_VR_indices.tblastx

#now we need to parse the BLAST script
perl ~/Scripts/parse_bls.pl --i DR_091_Mobl_MOE_assembled_v_OR_VR_indices.tblastx > DR_091_Mobl_MOE_assembled_v_OR_VR_indices.tblastx.parsed
#next combine the parsed file with the sequences
~/Scripts/extract_hsps_new.pl -i renamed_DR_091_Mobl_MOE_assembled.fasta -t DR_091_Mobl_MOE_assembled_v_OR_VR_indices.tblastx.parsed -s DR_091_Mormoops_blainvelli
Okay, I have finally accepted that my data is not fitting an Ornstein-Uhlenbeck model of evolution and it makes zero sense to be running the tests that I have been running. My alpha value is way too small for all of my traits compared to the height of my tree.

I have run the fitContinuous() function in the "geiger" package on the geometric mean.

Thursday, September 4, 2014

Thursday, August 21, 2014


#prior lambda = 6
#for the geometric mean (body size)
x<-data$GM
names(x)<-rownames(data)
prior <- make.prior(tree, model = "OU", param=list("dalpha"=list(),"dsig2"=list(),"dtheta"=list(),
                                         "dk"=list(lambda=6,kmax=length(tree$tip.label)*2-2),
                                          "dloc"=list(min=0,max=1),
                                         "dsb"=list(bmax=Inf,prob=tree$edge.length)))


fit.1<-bayou.mcmc(tree, x, model = "OU",prior= prior, ngen = 500000)
fit.2<-bayou.mcmc(tree, x,model = "OU", prior= prior, ngen = 500000)
chain.1 <- load.bayou(fit.1, save.Rdata=FALSE, cleanup=TRUE)
chain.2 <- load.bayou(fit.2, save.Rdata=FALSE, cleanup=TRUE)

combine<-combine.chains(chain.1, chain.2, burnin.prop = 0.2)

plotSimmap.mcmc(tree,combine, burnin = .1, fsize = 0.5,circle.lwd = 0.5 )

> effectiveSize(combine$lnL)
    var1
47696.89
> effectiveSize(combine$alpha)
    var1
17668.65
> effectiveSize(combine$k)
    var1
200.3831

http://andrewgelman.com/2010/10/29/could_someone_p/

#GM
#lambda=15
prior <- make.prior(tree, dists=list(dalpha="dlnorm", dsig2="dlnorm",dsb="dsb", dk="cdpois",
                                     dtheta="dnorm"), param=list(dalpha=list(meanlog=-5, sdlog=2),
                                     dsig2=list(meanlog=-1, sdlog=5), dk=list(lambda=15, kmax=200),
                                     dsb=list(bmax=Inf,prob=tree$edge.length),
                                     dtheta=list(mean=mean(x), sd=2)))



fit.1<-bayou.mcmc(tree, x, model = "OU",prior= prior, ngen = 500000)
fit.2<-bayou.mcmc(tree, x,model = "OU", prior= prior, ngen = 500000)
chain.1 <- load.bayou(fit.1, save.Rdata=FALSE, cleanup=TRUE)
chain.2 <- load.bayou(fit.2, save.Rdata=FALSE, cleanup=TRUE)

combine<-combine.chains(chain.1, chain.2, burnin.prop = 0.2)

> effectiveSize(combine$lnL)
    var1
54733.55
> effectiveSize(combine$alpha)
    var1
21377.18
> effectiveSize(combine$k)
    var1
460.0945
> median(combine$alpha)
[1] 0.005785393
> median(combine$k)
[1] 1
> median(combine$lnL)
[1] -226.6656
> median(combine$sig2)
[1] 0.1995975
> median(combine$ntheta)
[1] 2
> median(combine$theta)

plotSimmap.mcmc(tree,combine, burnin = 0, fsize = 0.5,circle.lwd = 0.5 )


#pPC1, lambda = 6
x<-data$PC1
names(x)<-rownames(data)
prior <- make.prior(tree, model = "OU", param=list("dalpha"=list(),"dsig2"=list(),"dtheta"=list(),
                                         "dk"=list(lambda=6,kmax=length(tree$tip.label)*2-2),
                                          "dloc"=list(min=0,max=1),
                                         "dsb"=list(bmax=Inf,prob=tree$edge.length)))
> effectiveSize(combine$lnL)
    var1
77556.68
> effectiveSize(combine$alpha)
    var1
750.1128
> effectiveSize(combine$k)
    var1
1098.867
> median(combine$alpha)
[1] 0.01054099
> median(combine$k)
[1] 5
> median(combine$lnL)
[1] -232.5772
> median(combine$sig2)
[1] 0.2309304
> median(combine$ntheta)
[1] 6

#pPC1
#lambda=15
prior <- make.prior(tree, dists=list(dalpha="dlnorm", dsig2="dlnorm",dsb="dsb", dk="cdpois",
                                     dtheta="dnorm"), param=list(dalpha=list(meanlog=-5, sdlog=2),
                                     dsig2=list(meanlog=-1, sdlog=5), dk=list(lambda=15, kmax=200),
                                     dsb=list(bmax=Inf,prob=tree$edge.length),
                                     dtheta=list(mean=mean(x), sd=2)))

fit.1<-bayou.mcmc(tree, x, model = "OU",prior= prior, ngen = 500000, plot.freq = NULL)
fit.2<-bayou.mcmc(tree, x,model = "OU", prior= prior, ngen = 500000, plot.freq = NULL)
chain.1 <- load.bayou(fit.1, save.Rdata=FALSE, cleanup=TRUE)
chain.2 <- load.bayou(fit.2, save.Rdata=FALSE, cleanup=TRUE)

combine<-combine.chains(chain.1, chain.2, burnin.prop = 0.2)

> effectiveSize(combine$lnL)
    var1
46687.59
> effectiveSize(combine$alpha)
    var1
1882.031
> effectiveSize(combine$k)
    var1
1916.088
> median(combine$alpha)
[1] 0.01097906
> median(combine$k)
[1] 6
> median(combine$lnL)
[1] -682.1394
> median(combine$sig2)
[1] 187.9879
> median(combine$ntheta)
[1] 7

Tuesday, August 19, 2014

#installing Fasta_reader (needed to run script for stats on Trinity assembly)
#first need to install npm
brew install node
vi server.js
#paste the following
var http = require('http'); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello World\n'); }).listen(8124, "127.0.0.1"); console.log('Server running at http://127.0.0.1:8124/');
#ESC wq!

#to check if it works
node server.js

npm install fastareader

WOOPS! nevermind! All of this stuff is already installed with Trinity. To run TrinityStats.pl, it cannot be run alone. You must navigate to the directory in which it is found.
###################
#   Ar_jam MOE    #
###################
#combine data from Lanes 1 & 2
cat Arjam_MOE* >> DR_004_Arjam_MOE_assembled.fasta
sudo perl /Volumes/Spare/transcriptomes/Trinity/util/TrinityStats.pl /Volumes/Spare/transcriptomes/MOE_VNO_transcriptomes/Trinity_assembly/DR_004_Arjam_MOE_assembled.fasta
################################
## Counts of transcripts, etc.
################################
Total trinity transcripts: 140189
Total trinity components: 63956
Percent GC: 55.16

########################################
Stats based on ALL transcript contigs:
########################################

Contig N10: 4088
Contig N20: 3085
Contig N30: 2434
Contig N40: 1927
Contig N50: 1502

Median contig length: 426
Average contig: 826.50
Total assembled bases: 115865619


#####################################################
## Stats based on ONLY LONGEST ISOFORM per COMPONENT:
#####################################################

Contig N10: 3860
Contig N20: 2862
Contig N30: 2212
Contig N40: 1726
Contig N50: 1310

Median contig length: 440
Average contig: 793.55
Total assembled bases: 50752060 
###################
#   Ar_jam VNO    #
###################
#combine data from Lanes 1 & 2
cat Arjam_VNO* >> DR_011_Arjam_VNO_assembled.fasta
sudo perl /Volumes/Spare/transcriptomes/Trinity/util/TrinityStats.pl /Volumes/Spare/transcriptomes/MOE_VNO_transcriptomes/Trinity_assembly/DR_011_Arjam_VNO_assembled.fasta
################################
## Counts of transcripts, etc.
################################
Total trinity transcripts: 138142
Total trinity components: 61477
Percent GC: 55.51

########################################
Stats based on ALL transcript contigs:
########################################

Contig N10: 4735
Contig N20: 3550
Contig N30: 2822
Contig N40: 2240
Contig N50: 1741

Median contig length: 455
Average contig: 920.11
Total assembled bases: 127106357


#####################################################
## Stats based on ONLY LONGEST ISOFORM per COMPONENT:
#####################################################

Contig N10: 3855
Contig N20: 2871
Contig N30: 2270
Contig N40: 1814
Contig N50: 1436

Median contig length: 467
Average contig: 848.36

Total assembled bases: 52154403
###################
#   Mo_red MOE    #
###################
#combine data from Lanes 1 & 2
cat Mored_MOE* >> DR_013_Mored_MOE_assembled.fasta
sudo perl /Volumes/Spare/transcriptomes/Trinity/util/TrinityStats.pl /Volumes/Spare/transcriptomes/MOE_VNO_transcriptomes/Trinity_assembly/DR_013_Mored_MOE_assembled.fasta
################################
## Counts of transcripts, etc.
################################
Total trinity transcripts: 245573
Total trinity components: 100799
Percent GC: 56.14

########################################
Stats based on ALL transcript contigs:
########################################

Contig N10: 5311
Contig N20: 4036
Contig N30: 3304
Contig N40: 2703
Contig N50: 2118

Median contig length: 406
Average contig: 960.45
Total assembled bases: 235861464


#####################################################
## Stats based on ONLY LONGEST ISOFORM per COMPONENT:
#####################################################

Contig N10: 4140
Contig N20: 2880
Contig N30: 2105
Contig N40: 1512
Contig N50: 1053

Median contig length: 377
Average contig: 685.78

Total assembled bases: 69126334

###################
#   Mo_red VNO    #
###################
#combine data from Lanes 1 & 2
cat Mored_VNO* >> DR_013_Mored_VNO_assembled.fasta
sudo perl $TRINITY_HOME/util/TrinityStats.pl DR_013_Mored_VNO_assembled.fasta
################################
## Counts of transcripts, etc.
################################
Total trinity transcripts: 163233
Total trinity components: 70319
Percent GC: 55.74

########################################
Stats based on ALL transcript contigs:
########################################

Contig N10: 4648
Contig N20: 3450
Contig N30: 2778
Contig N40: 2186
Contig N50: 1709

Median contig length: 430
Average contig: 883.37
Total assembled bases: 144195048


#####################################################
## Stats based on ONLY LONGEST ISOFORM per COMPONENT:
#####################################################

Contig N10: 3865
Contig N20: 2811
Contig N30: 2110
Contig N40: 1605
Contig N50: 1187

Median contig length: 427
Average contig: 754.88

Total assembled bases: 53082440


###################
#   Mo_bla MOE    #
###################
cat Mobl_MOE* >> DR_091_Mobl_MOE_assembled.fasta
sudo perl $TRINITY_HOME/util/TrinityStats.pl DR_091_Mobl_MOE_assembled.fasta
################################
## Counts of transcripts, etc.
################################
Total trinity transcripts: 249551
Total trinity components: 115871
Percent GC: 54.01

########################################
Stats based on ALL transcript contigs:
########################################

Contig N10: 4786
Contig N20: 3549
Contig N30: 2741
Contig N40: 2123
Contig N50: 1598

Median contig length: 335
Average contig: 763.67
Total assembled bases: 190573978


#####################################################
## Stats based on ONLY LONGEST ISOFORM per COMPONENT:
#####################################################

Contig N10: 3761
Contig N20: 2670
Contig N30: 1971
Contig N40: 1426
Contig N50: 989

Median contig length: 344
Average contig: 636.73

Total assembled bases: 73778325

Monday, August 18, 2014

###################
#Lane 2 Ar_jam MOE#
###################
gunzip pair*
cat pair.DR004_Artibeus_Jamaicensis_MOE_ACAGTG_L002_R1* >> Arjam_MOE_L2_R1.fq
cat pair.DR004_Artibeus_Jamaicensis_MOE_ACAGTG_L002_R2* >> Arjam_MOE_L2_R2.fq
fastq_quality_filter -i Arjam_MOE_L2_R1.fq -o Arjam_MOE_R1_L2_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 15317036 reads.
Output: 14847480 reads.
discarded 469556 (3%) low-quality reads.
fastq_quality_filter -i Arjam_MOE_L2_R2.fq -o Arjam_MOE_R2_L2_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 15317036 reads.
Output: 14608448 reads.
discarded 708588 (4%) low-quality reads.
~/Scripts/both.py Arjam_MOE_R1_L2_fastxtrimmed.fastq Arjam_MOE_R2_L2_fastxtrimmed.fastq
#run trinity
Trinity.pl --seqType fq --left Arjam_MOE_R1_L2_fastxtrimmed.fastq.both --right Arjam_MOE_R2_L2_fastxtrimmed.fastq.both --CPU 4 --JM 20G --output output_Lane2

###################
#Lane 1 Ar_jam VNO#
###################
gunzip pair*
cat pair.DR011_Artibeus_Jamaicensis_VNO_TGACCA_L001_R1* >> Arjam_VNO_L1_R1.fq
cat pair.DR011_Artibeus_Jamaicensis_VNO_TGACCA_L001_R2* >> Arjam_VNO_L1_R2.fq
mv Arjam_VNO_L1_R* /Volumes/Spare/transcriptomes/MOE_VNO_transcriptomes/Trinity_assembly/to_assemble/DR_011_Arjam_VNO/
fastq_quality_filter -i Arjam_VNO_L1_R1.fq -o Arjam_VNO_R1_L1_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 13582991 reads.
Output: 13087556 reads.
discarded 495435 (3%) low-quality reads.
fastq_quality_filter -i Arjam_VNO_L1_R2.fq -o Arjam_VNO_R2_L1_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 13582991 reads.
Output: 12665813 reads.
discarded 917178 (6%) low-quality reads.

~/Scripts/both.py Arjam_VNO_R1_L1_fastxtrimmed.fastq Arjam_VNO_R2_L1_fastxtrimmed.fastq

Trinity.pl --seqType fq --left Arjam_VNO_R1_L1_fastxtrimmed.fastq.both --right Arjam_VNO_R2_L1_fastxtrimmed.fastq.both --CPU 4 --JM 20G --output output_Lane1

###################
#Lane 2 Ar_jam VNO#
###################
gunzip pair*
cat pair.DR011_Artibeus_Jamaicensis_VNO_TGACCA_L002_R1* >> Arjam_VNO_L2_R1.fq
cat pair.DR011_Artibeus_Jamaicensis_VNO_TGACCA_L002_R2* >> Arjam_VNO_L2_R2.fq
mv Arjam_VNO_L2_R* /Volumes/Spare/transcriptomes/MOE_VNO_transcriptomes/Trinity_assembly/to_assemble/DR_011_Arjam_VNO/
fastq_quality_filter -i Arjam_VNO_L2_R1.fq -o Arjam_VNO_R1_L2_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 13746292 reads.
Output: 13242643 reads.
discarded 503649 (3%) low-quality reads.
fastq_quality_filter -i Arjam_VNO_L2_R2.fq -o Arjam_VNO_R2_L2_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 13746292 reads.
Output: 12800383 reads.
discarded 945909 (6%) low-quality reads.

~/Scripts/both.py Arjam_VNO_R1_L2_fastxtrimmed.fastq Arjam_VNO_R2_L2_fastxtrimmed.fastq

Trinity.pl --seqType fq --left Arjam_VNO_R1_L2_fastxtrimmed.fastq.both --right Arjam_VNO_R2_L2_fastxtrimmed.fastq.both --CPU 4 --JM 20G --output output_Lane2

###################
#Lane 1 Mo_red MOE#
###################
gunzip pair*
cat pair.DR013_Monophyllus_Redmani_MOE_CAGATC_L001_R1* >> Mored_MOE_L1_R1.fq
cat pair.DR013_Monophyllus_Redmani_MOE_CAGATC_L001_R2* >> Mored_MOE_L1_R2.fq
mv Mored_MOE_L1_R* /Volumes/Spare/transcriptomes/MOE_VNO_transcriptomes/Trinity_assembly/to_assemble/DR_013_Mored_MOE/

fastq_quality_filter -i Mored_MOE_L1_R1.fq -o Mored_MOE_R1_L1_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 14546140 reads.
Output: 14205942 reads.
discarded 340198 (2%) low-quality reads.

fastq_quality_filter -i Mored_MOE_L1_R2.fq -o Mored_MOE_R2_L1_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 14546140 reads.
Output: 13998979 reads.
discarded 547161 (3%) low-quality reads.

~/Scripts/both.py Mored_MOE_R1_L1_fastxtrimmed.fastq Mored_MOE_R2_L1_fastxtrimmed.fastq

Trinity.pl --seqType fq --left Mored_MOE_R1_L1_fastxtrimmed.fastq.both --right Mored_MOE_R2_L1_fastxtrimmed.fastq.both --CPU 4 --JM 20G --output output_Lane1
###################
#Lane 2 Mo_red MOE#
###################
gunzip pair*
cat pair.DR013_Monophyllus_Redmani_MOE_CAGATC_L002_R1* >> Mored_MOE_L2_R1.fq
cat pair.DR013_Monophyllus_Redmani_MOE_CAGATC_L002_R2* >> Mored_MOE_L2_R2.fq
mv Mored_MOE_L2_R* /Volumes/Spare/transcriptomes/MOE_VNO_transcriptomes/Trinity_assembly/to_assemble/DR_013_Mored_MOE/

fastq_quality_filter -i Mored_MOE_L2_R1.fq -o Mored_MOE_R1_L2_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v


Quality cut-off: 20
Minimum percentage: 80
Input: 14719223 reads.
Output: 14370971 reads.
discarded 348252 (2%) low-quality reads.

fastq_quality_filter -i Mored_MOE_L2_R2.fq -o Mored_MOE_R2_L2_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 14719223 reads.
Output: 14152391 reads.
discarded 566832 (3%) low-quality reads.

~/Scripts/both.py Mored_MOE_R1_L2_fastxtrimmed.fastq Mored_MOE_R2_L2_fastxtrimmed.fastq

Trinity.pl --seqType fq --left Mored_MOE_R1_L2_fastxtrimmed.fastq.both --right Mored_MOE_R2_L2_fastxtrimmed.fastq.both --CPU 4 --JM 20G --output output_Lane2

###################
#Lane 1 Mo_red VNO#
###################
gunzip pair*
cat pair.DR013_Monophyllus_Redmani_VNO_AGTCAA_L001_R1* >> Mored_VNO_L1_R1.fq
cat pair.DR013_Monophyllus_Redmani_VNO_AGTCAA_L001_R2* >> Mored_VNO_L1_R2.fq
mv Mored_VNO_L1_R* /Volumes/Spare/transcriptomes/MOE_VNO_transcriptomes/Trinity_assembly/to_assemble/DR_013_Mored_VNO/

fastq_quality_filter -i Mored_VNO_L1_R1.fq -o Mored_VNO_R1_L1_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 14001874 reads.
Output: 13728472 reads.
discarded 273402 (1%) low-quality reads
fastq_quality_filter -i Mored_VNO_L1_R2.fq -o Mored_VNO_R2_L1_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 14001874 reads.
Output: 13444029 reads.
discarded 557845 (3%) low-quality reads.

~/Scripts/both.py Mored_VNO_R1_L1_fastxtrimmed.fastq Mored_VNO_R2_L1_fastxtrimmed.fastq

Trinity.pl --seqType fq --left Mored_VNO_R1_L1_fastxtrimmed.fastq.both --right Mored_VNO_R2_L1_fastxtrimmed.fastq.both --CPU 4 --JM 20G --output output_Lane1
###################
#Lane 2 Mo_red VNO#
###################
gunzip pair*
cat pair.DR013_Monophyllus_Redmani_VNO_AGTCAA_L002_R1* >> Mored_VNO_L2_R1.fq
cat pair.DR013_Monophyllus_Redmani_VNO_AGTCAA_L002_R2* >> Mored_VNO_L2_R2.fq
mv Mored_VNO_L2_R* /Volumes/Spare/transcriptomes/MOE_VNO_transcriptomes/Trinity_assembly/to_assemble/DR_013_Mored_VNO/

fastq_quality_filter -i Mored_VNO_L2_R1.fq -o Mored_VNO_R1_L2_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 14176309 reads.
Output: 13892340 reads.
discarded 283969 (2%) low-quality reads.
fastq_quality_filter -i Mored_VNO_L2_R2.fq -o Mored_VNO_R2_L2_fastxtrimmed.fastq -q 20 -p 80 -Q 33 
-v
Quality cut-off: 20
Minimum percentage: 80
Input: 14176309 reads.
Output: 13594457 reads.
discarded 581852 (4%) low-quality reads.


~/Scripts/both.py Mored_VNO_R1_L2_fastxtrimmed.fastq Mored_VNO_R2_L2_fastxtrimmed.fastq

Trinity.pl --seqType fq --left Mored_VNO_R1_L2_fastxtrimmed.fastq.both --right Mored_VNO_R2_L2_fastxtrimmed.fastq.both --CPU 4 --JM 20G --output output_Lane2
###################
#Lane 1 Mo_bla MOE#
###################
gunzip pair*
cat pair.DR091_Mormoops_Blainvelli_MOE_CGATGT_L001_R1* >> Mobla_MOE_L1_R1.fq
cat pair.DR091_Mormoops_Blainvelli_MOE_CGATGT_L001_R2* >> Mobla_MOE_L1_R2.fq
mv Mobla_MOE_L1_R* /Volumes/Spare/transcriptomes/MOE_VNO_transcriptomes/Trinity_assembly/to_assemble/DR_091_Mobl_MOE/

fastq_quality_filter -i Mobla_MOE_L1_R1.fq -o Mobl_MOE_R1_L1_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 16080335 reads.
Output: 15736163 reads.

discarded 344172 (2%) low-quality reads.
fastq_quality_filter -i Mobla_MOE_L1_R2.fq -o Mobl_MOE_R2_L1_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 16080335 reads.
Output: 15462533 reads.

discarded 617802 (3%) low-quality reads.
~/Scripts/both.py Mobl_MOE_R1_L1_fastxtrimmed.fastq Mobl_MOE_R2_L1_fastxtrimmed.fastq

Trinity.pl --seqType fq --left Mobl_MOE_R1_L1_fastxtrimmed.fastq.both --right Mobl_MOE_R2_L1_fastxtrimmed.fastq.both --CPU 4 --JM 20G --output output_Lane1
###################
#Lane 2 Mo_bla MOE#
###################
gunzip pair*
cat pair.DR091_Mormoops_Blainvelli_MOE_CGATGT_L002_R1* >> Mobla_MOE_L2_R1.fq
cat pair.DR091_Mormoops_Blainvelli_MOE_CGATGT_L002_R2* >> Mobla_MOE_L2_R2.fq
mv Mobla_MOE_L2_R* /Volumes/Spare/transcriptomes/MOE_VNO_transcriptomes/Trinity_assembly/to_assemble/DR_091_Mobl_MOE/

fastq_quality_filter -i Mobla_MOE_L2_R1.fq -o Mobl_MOE_R1_L2_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 16263150 reads.
Output: 15909002 reads.

discarded 354148 (2%) low-quality reads.
fastq_quality_filter -i Mobla_MOE_L2_R2.fq -o Mobl_MOE_R2_L2_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 16263150 reads.
Output: 15621798 reads.

discarded 641352 (3%) low-quality reads.
~/Scripts/both.py Mobl_MOE_R1_L2_fastxtrimmed.fastq Mobl_MOE_R2_L2_fastxtrimmed.fastq

Trinity.pl --seqType fq --left Mobl_MOE_R1_L2_fastxtrimmed.fastq.both --right Mobl_MOE_R2_L2_fastxtrimmed.fastq.both --CPU 4 --JM 20G --output output_Lane2

Friday, August 15, 2014

Looks like the 24 hours of redoing the libc++ is working out.

CMD finished (161 seconds)
Thursday, August 14, 2014: 21:56:02     CMD: touch /Volumes/Spare/transcriptomes/MOE_VNO_transcriptomes/Trinity_assembly/to_assemble/DR_004_Arjam_MOE/output_4/chrysalis/file_partitioning.ok
CMD finished (0 seconds)
---------------------------------------------------
----------- Chrysalis: QuantifyGraph --------------
-- (Integrate mapped reads into de Bruijn graph) --
---------------------------------------------------

Thursday, August 14, 2014: 21:56:03     CMD: /usr/local/Cellar/trinity/r20131110/trinity-plugins/parafly/bin/ParaFly -c /Volumes/Spare/transcriptomes/MOE_VNO_transcriptomes/Trinity_assembly/to_assemble/DR_004_Arjam_MOE/output_4/chrysalis/quantifyGraph_commands -CPU 4 -failed_cmds failed_quantify_graph_commands.82191.txt -v -shuffle
Number of  Commands: 57201

succeeded(48273)   84.3919% completed.

...several hours later...

Butterfly finished too!
72,120 transcripts were assembled. Are there any ORs? Gah, I hope so... :-|

Thursday, August 14, 2014

brew install https://raw.github.com/Homebrew/homebrew-science/master/fastx_toolkit.rb
brew link --overwrite fastx_toolkit

Linking /usr/local/Cellar/fastx_toolkit/0.0.14... 
Error: Could not symlink include/gtextutils/gtextutils/container_join.h
/usr/local/include/gtextutils/gtextutils is not writable.

brew install https://raw.github.com/Homebrew/homebrew-science/master/fastx_toolkit.rb
######################################################################## 100.0%
Warning: fastx_toolkit-0.0.14 already installed, it's just not linked

sudo chown -R laurelyohe /usr/local/include
brew link --overwrite fastx_toolkit

Alright! I think it is installed. I have avoided using fastx-toolkit because it is such a pain in the butt to install, but alas, the uncertain has become certain. Why am I doing this? Because I am getting these alarming warning messages while Trinity is running. It seems to carry on regardless but it doesn't seem right.

...
warning: ignoring read XXXX since it cannot decipher if /1 or /2 of a pair.
Error: note that there were 4329123 reads that could not be deciphered as being /1 or /2 of a PE fragment. Hopefully, these were SE reads that should have been ignored. Otherwise, please research this further.

Okay, awesome. Seems like someone else ran into this problem:

After looking at what my input files were before and after trimming, it is clear that the fastq files have lost track of being from the right or left pair. Uggh! I knew that girl's Trinity help was too good to be true.

head Arjam_MOE_R1.fq
@HWI-ST885:197:HA2RPADXX:1:1101:1464:2292 1:N:0:ACAGTG
GCCCTTGCCGCCAGCTGGCGCGGCCACGCAGCGGCTGAGCGAGTCTAGGCGCGCGCTGTACTCGGTGAACTTCTTTTGGTAGCGCAGAGCAGTCATTTGC
+
CCCFFFFFHHHHHJJJIJJJJJJJJJJJJJJIHHFDDCDDDDDDDDDDDCBBDBBBDDDDDEDDD@DDDDDDDDDDDDDACCDDDDDDDDDDDDDDEEED
@HWI-ST885:197:HA2RPADXX:1:1101:1438:2386 1:N:0:ACAGTG
GGGTGGAGCAAATTCTGGGCCAAAAGTAGTAGGTTCTGAAGGAGGAGGCAATGGTGGTGGTGGTGGACGTGGTCTCCTGAAGTGTCTGCGACTCTCACCA
+
CCCFFFFFDHHHHJGGHGIGIJJJHIFHIGIIJJJIIIIGHGGHI?FGHIGIIJ8@=5BEHHH9BEFEDCDDBDDDDDDCCD>@CCDDDDBDBBDDCCBC
@HWI-ST885:197:HA2RPADXX:1:1101:1310:2396 1:N:0:ACAGTG
CTTGGGTCGTGAGTGAGAACAGGCTGGTAGACGGGGCGCTCGCCGAAGGCTGGGATGAAGTCCCGAAACTAAACCCACCAGCGCTGGATGAGGAGAAAGA

head Arjam_MOE_R2.fq
@HWI-ST885:197:HA2RPADXX:1:1101:1464:2292 2:N:0:ACAGTG
CAACGGCGCGCTGCAGGCCCGGCTCGCCGCGCTGCACAAGGCTTTCAAGAAGGAGGCTCTGCGCGCAGGCAAGCGCGAGAAGTCGCTGGTGGCGCAGCTG
+
CCCFFFFFHHHHHJJJJJJIIJIEIBEHBBBBBDDD<CDDDDDDDCCDDDDDD?ABBBDDDDDDDBBDD@DDDDDDD<BBDDDDDDDBB<CBABD@BDDC
@HWI-ST885:197:HA2RPADXX:1:1101:1438:2386 2:N:0:ACAGTG
CCTGTTTCTCGCCTGGTGAGAGTCGCAGACACTTCAGGAGACCACGTCCACCACCACCACCATTGCCTCCTCCTTCAGAACCTACTACTTTTGGCCCAGA
+
BBBFFFFFHHGHHIJJGHHHHIGHIIIIJJJJJJJJJJIJIJFJJGHIIHJGHJJEHHHFFFFEEEEEEDDDDDDDDDDDDDDDDDDDDDEDDDDDDBDD
@HWI-ST885:197:HA2RPADXX:1:1101:1310:2396 2:N:0:ACAGTG
CAGGCGGGTTCACGTTTGGCACCGCAAAGACGGCGACAACCACCCCTGCCACCGGCTTTTCTTTCTCCTCATCCAGCGCTGGTGGGTTTAGTTTCGGGAC

Okay see how there is a 1 and a 2 in the title line?

head Arjam_MOE_R1_trim.fq
@HWI-ST885:197:HA2RPADXX:1:1101:1464:2292
GCCCTTGCCGCCAGCTGGCGCGGCCACGCAGCGGCTGAGCGAGTCTAGGCGCGCGCTGTACTCGGTGAACTTCTTTTGGTAGCGCAGAGCAGTCATTTGC
+
CCCFFFFFHHHHHJJJIJJJJJJJJJJJJJJIHHFDDCDDDDDDDDDDDCBBDBBBDDDDDEDDD@DDDDDDDDDDDDDACCDDDDDDDDDDDDDDEEED
@HWI-ST885:197:HA2RPADXX:1:1101:1438:2386
GGGTGGAGCAAATTCTGGGCCAAAAGTAGTAGGTTCTGAAGGAGGAGGCAATGGTGGTGGTGGTGGACGTGGTCTCCTGAAGTGTCTGCGACTCTCACCA
+
CCCFFFFFDHHHHJGGHGIGIJJJHIFHIGIIJJJIIIIGHGGHI?FGHIGIIJ8@=5BEHHH9BEFEDCDDBDDDDDDCCD>@CCDDDDBDBBDDCCBC
@HWI-ST885:197:HA2RPADXX:1:1101:1310:2396
CTTGGGTCGTGAGTGAGAACAGGCTGGTAGACGGGGCGCTCGCCGAAGGCTGGGATGAAGTCCCGAAACTAAACCCACCAGCGCTGGATGAGGAGAAAGA

head Arjam_MOE_R2_trim.fq
@HWI-ST885:197:HA2RPADXX:1:1101:1464:2292
CAACGGCGCGCTGCAGGCCCGGCTCGCCGCGCTGCACAAGGCTTTCAAGAAGGAGGCTCTGCGCGCAGGCAAGCGCGAGAAGTCGCTGGTGGCGCAGCTG
+
CCCFFFFFHHHHHJJJJJJIIJIEIBEHBBBBBDDD<CDDDDDDDCCDDDDDD?ABBBDDDDDDDBBDD@DDDDDDD<BBDDDDDDDBB<CBABD@BDDC
@HWI-ST885:197:HA2RPADXX:1:1101:1438:2386
CCTGTTTCTCGCCTGGTGAGAGTCGCAGACACTTCAGGAGACCACGTCCACCACCACCACCATTGCCTCCTCCTTCAGAACCTACTACTTTTGGCCCAGA
+
BBBFFFFFHHGHHIJJGHHHHIGHIIIIJJJJJJJJJJIJIJFJJGHIIHJGHJJEHHHFFFFEEEEEEDDDDDDDDDDDDDDDDDDDDDEDDDDDDBDD
@HWI-ST885:197:HA2RPADXX:1:1101:1310:2396
CAGGCGGGTTCACGTTTGGCACCGCAAAGACGGCGACAACCACCCCTGCCACCGGCTTTTCTTTCTCCTCATCCAGCGCTGGTGGGTTTAGTTTCGGGAC

See how it goes away? Bollocks! This happens after running python ~/Scripts/q-trim.py.

Let's see if this strips it.
fastq_quality_filter -i Arjam_MOE_R1.fq -o Arjam_MOE_R1_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 15149790 reads.
Output: 14691119 reads.

head Arjam_MOE_R1_fastxtrimmed.fastq 
@HWI-ST885:197:HA2RPADXX:1:1101:1464:2292 1:N:0:ACAGTG
GCCCTTGCCGCCAGCTGGCGCGGCCACGCAGCGGCTGAGCGAGTCTAGGCGCGCGCTGTACTCGGTGAACTTCTTTTGGTAGCGCAGAGCAGTCATTTGC
+
CCCFFFFFHHHHHJJJIJJJJJJJJJJJJJJIHHFDDCDDDDDDDDDDDCBBDBBBDDDDDEDDD@DDDDDDDDDDDDDACCDDDDDDDDDDDDDDEEED
@HWI-ST885:197:HA2RPADXX:1:1101:1438:2386 1:N:0:ACAGTG
GGGTGGAGCAAATTCTGGGCCAAAAGTAGTAGGTTCTGAAGGAGGAGGCAATGGTGGTGGTGGTGGACGTGGTCTCCTGAAGTGTCTGCGACTCTCACCA
+
CCCFFFFFDHHHHJGGHGIGIJJJHIFHIGIIJJJIIIIGHGGHI?FGHIGIIJ8@=5BEHHH9BEFEDCDDBDDDDDDCCD>@CCDDDDBDBBDDCCBC
@HWI-ST885:197:HA2RPADXX:1:1101:1310:2396 1:N:0:ACAGTG
CTTGGGTCGTGAGTGAGAACAGGCTGGTAGACGGGGCGCTCGCCGAAGGCTGGGATGAAGTCCCGAAACTAAACCCACCAGCGCTGGATGAGGAGAAAGA

Cool! Looks like it kept the extended header on.

fastq_quality_filter -i Arjam_MOE_R2.fq -o Arjam_MOE_R2_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 15149790 reads.
Output: 14464395 reads.
discarded 685395 (4%) low-quality reads.

For once something worked. Oh yeah, just as a side note if you have illumina reads you must put -Q 33 parameter. This of course, is an undocumented fix discovered by users. Three+ years after this discovery, -Q remains to be documented in the fastx-toolkit. 

I still am going to do the pair matching before running trinity again.

~/Scripts/both.py Arjam_MOE_R1_fastxtrimmed.fastq Arjam_MOE_R2_fastxtrimmed.fastq 

Trinity.pl --seqType fq --left Arjam_MOE_R1_fastxtrimmed.fastq.both --right Arjam_MOE_R2_fastxtrimmed.fastq.both --CPU 4 --JM 20G --output output_4

----------------------------------------------------------------

Yay! I fixed the /1 /2 problem. You can see that the fasta files trinity creates there is now the /1 and /2. Now we will see if we can get past chrysalis and if the other error with the c++libs is fixed.

head left.fa
>HWI-ST885:197:HA2RPADXX:1:1101:1464:2292/1
GCCCTTGCCGCCAGCTGGCGCGGCCACGCAGCGGCTGAGCGAGTCTAGGCGCGCGCTGTACTCGGTGAACTTCTTTTGGTAGCGCAGAGCAGTCATTTGC
>HWI-ST885:197:HA2RPADXX:1:1101:1438:2386/1
GGGTGGAGCAAATTCTGGGCCAAAAGTAGTAGGTTCTGAAGGAGGAGGCAATGGTGGTGGTGGTGGACGTGGTCTCCTGAAGTGTCTGCGACTCTCACCA
>HWI-ST885:197:HA2RPADXX:1:1101:1310:2396/1
CTTGGGTCGTGAGTGAGAACAGGCTGGTAGACGGGGCGCTCGCCGAAGGCTGGGATGAAGTCCCGAAACTAAACCCACCAGCGCTGGATGAGGAGAAAGA
>HWI-ST885:197:HA2RPADXX:1:1101:1346:2468/1
ATGGAGCCCAGGCCTCCAGTGCAGAGTGAGTGCTTCCTTCCATGGTCCCCATGCCATCGTGTGACAAGTTCTGTGACCTGATTTCCAGCACTGTCATCCA
>HWI-ST885:197:HA2RPADXX:1:1101:1467:2481/1
GGCCCAGGGAGTCCTCCACCACCCCCTCCCCTTTCCTGGCCTGCTCTCTAATTCTCTAGAAACCTTCCTGTGTATCCTGCCTACTTAAACCCTGCATCCC
head right.fa
>HWI-ST885:197:HA2RPADXX:1:1101:1464:2292/2
CAACGGCGCGCTGCAGGCCCGGCTCGCCGCGCTGCACAAGGCTTTCAAGAAGGAGGCTCTGCGCGCAGGCAAGCGCGAGAAGTCGCTGGTGGCGCAGCTG
>HWI-ST885:197:HA2RPADXX:1:1101:1438:2386/2
CCTGTTTCTCGCCTGGTGAGAGTCGCAGACACTTCAGGAGACCACGTCCACCACCACCACCATTGCCTCCTCCTTCAGAACCTACTACTTTTGGCCCAGA
>HWI-ST885:197:HA2RPADXX:1:1101:1310:2396/2
CAGGCGGGTTCACGTTTGGCACCGCAAAGACGGCGACAACCACCCCTGCCACCGGCTTTTCTTTCTCCTCATCCAGCGCTGGTGGGTTTAGTTTCGGGAC
>HWI-ST885:197:HA2RPADXX:1:1101:1346:2468/2
GGGGGTGGAAATTTTGAGACAAATGTTGATTCTTGGTGAGTTGATGAGTCTTTTCTAGACACATAGAGAAGGTGCTGAAGATTGAGAGAAAACGCCTTCC
>HWI-ST885:197:HA2RPADXX:1:1101:1467:2481/2

GTCAAACTCGTTGTAGCAGATTCTACTTGGGATGCAGGGTTTAAGTAGGCAGGATACACAGGAAGGTTTCTAGAGAATTAGAGAGCAGGCCAGGAAAGGG

To be continued.