Thursday, August 14, 2014

brew install https://raw.github.com/Homebrew/homebrew-science/master/fastx_toolkit.rb
brew link --overwrite fastx_toolkit

Linking /usr/local/Cellar/fastx_toolkit/0.0.14... 
Error: Could not symlink include/gtextutils/gtextutils/container_join.h
/usr/local/include/gtextutils/gtextutils is not writable.

brew install https://raw.github.com/Homebrew/homebrew-science/master/fastx_toolkit.rb
######################################################################## 100.0%
Warning: fastx_toolkit-0.0.14 already installed, it's just not linked

sudo chown -R laurelyohe /usr/local/include
brew link --overwrite fastx_toolkit

Alright! I think it is installed. I have avoided using fastx-toolkit because it is such a pain in the butt to install, but alas, the uncertain has become certain. Why am I doing this? Because I am getting these alarming warning messages while Trinity is running. It seems to carry on regardless but it doesn't seem right.

...
warning: ignoring read XXXX since it cannot decipher if /1 or /2 of a pair.
Error: note that there were 4329123 reads that could not be deciphered as being /1 or /2 of a PE fragment. Hopefully, these were SE reads that should have been ignored. Otherwise, please research this further.

Okay, awesome. Seems like someone else ran into this problem:

After looking at what my input files were before and after trimming, it is clear that the fastq files have lost track of being from the right or left pair. Uggh! I knew that girl's Trinity help was too good to be true.

head Arjam_MOE_R1.fq
@HWI-ST885:197:HA2RPADXX:1:1101:1464:2292 1:N:0:ACAGTG
GCCCTTGCCGCCAGCTGGCGCGGCCACGCAGCGGCTGAGCGAGTCTAGGCGCGCGCTGTACTCGGTGAACTTCTTTTGGTAGCGCAGAGCAGTCATTTGC
+
CCCFFFFFHHHHHJJJIJJJJJJJJJJJJJJIHHFDDCDDDDDDDDDDDCBBDBBBDDDDDEDDD@DDDDDDDDDDDDDACCDDDDDDDDDDDDDDEEED
@HWI-ST885:197:HA2RPADXX:1:1101:1438:2386 1:N:0:ACAGTG
GGGTGGAGCAAATTCTGGGCCAAAAGTAGTAGGTTCTGAAGGAGGAGGCAATGGTGGTGGTGGTGGACGTGGTCTCCTGAAGTGTCTGCGACTCTCACCA
+
CCCFFFFFDHHHHJGGHGIGIJJJHIFHIGIIJJJIIIIGHGGHI?FGHIGIIJ8@=5BEHHH9BEFEDCDDBDDDDDDCCD>@CCDDDDBDBBDDCCBC
@HWI-ST885:197:HA2RPADXX:1:1101:1310:2396 1:N:0:ACAGTG
CTTGGGTCGTGAGTGAGAACAGGCTGGTAGACGGGGCGCTCGCCGAAGGCTGGGATGAAGTCCCGAAACTAAACCCACCAGCGCTGGATGAGGAGAAAGA

head Arjam_MOE_R2.fq
@HWI-ST885:197:HA2RPADXX:1:1101:1464:2292 2:N:0:ACAGTG
CAACGGCGCGCTGCAGGCCCGGCTCGCCGCGCTGCACAAGGCTTTCAAGAAGGAGGCTCTGCGCGCAGGCAAGCGCGAGAAGTCGCTGGTGGCGCAGCTG
+
CCCFFFFFHHHHHJJJJJJIIJIEIBEHBBBBBDDD<CDDDDDDDCCDDDDDD?ABBBDDDDDDDBBDD@DDDDDDD<BBDDDDDDDBB<CBABD@BDDC
@HWI-ST885:197:HA2RPADXX:1:1101:1438:2386 2:N:0:ACAGTG
CCTGTTTCTCGCCTGGTGAGAGTCGCAGACACTTCAGGAGACCACGTCCACCACCACCACCATTGCCTCCTCCTTCAGAACCTACTACTTTTGGCCCAGA
+
BBBFFFFFHHGHHIJJGHHHHIGHIIIIJJJJJJJJJJIJIJFJJGHIIHJGHJJEHHHFFFFEEEEEEDDDDDDDDDDDDDDDDDDDDDEDDDDDDBDD
@HWI-ST885:197:HA2RPADXX:1:1101:1310:2396 2:N:0:ACAGTG
CAGGCGGGTTCACGTTTGGCACCGCAAAGACGGCGACAACCACCCCTGCCACCGGCTTTTCTTTCTCCTCATCCAGCGCTGGTGGGTTTAGTTTCGGGAC

Okay see how there is a 1 and a 2 in the title line?

head Arjam_MOE_R1_trim.fq
@HWI-ST885:197:HA2RPADXX:1:1101:1464:2292
GCCCTTGCCGCCAGCTGGCGCGGCCACGCAGCGGCTGAGCGAGTCTAGGCGCGCGCTGTACTCGGTGAACTTCTTTTGGTAGCGCAGAGCAGTCATTTGC
+
CCCFFFFFHHHHHJJJIJJJJJJJJJJJJJJIHHFDDCDDDDDDDDDDDCBBDBBBDDDDDEDDD@DDDDDDDDDDDDDACCDDDDDDDDDDDDDDEEED
@HWI-ST885:197:HA2RPADXX:1:1101:1438:2386
GGGTGGAGCAAATTCTGGGCCAAAAGTAGTAGGTTCTGAAGGAGGAGGCAATGGTGGTGGTGGTGGACGTGGTCTCCTGAAGTGTCTGCGACTCTCACCA
+
CCCFFFFFDHHHHJGGHGIGIJJJHIFHIGIIJJJIIIIGHGGHI?FGHIGIIJ8@=5BEHHH9BEFEDCDDBDDDDDDCCD>@CCDDDDBDBBDDCCBC
@HWI-ST885:197:HA2RPADXX:1:1101:1310:2396
CTTGGGTCGTGAGTGAGAACAGGCTGGTAGACGGGGCGCTCGCCGAAGGCTGGGATGAAGTCCCGAAACTAAACCCACCAGCGCTGGATGAGGAGAAAGA

head Arjam_MOE_R2_trim.fq
@HWI-ST885:197:HA2RPADXX:1:1101:1464:2292
CAACGGCGCGCTGCAGGCCCGGCTCGCCGCGCTGCACAAGGCTTTCAAGAAGGAGGCTCTGCGCGCAGGCAAGCGCGAGAAGTCGCTGGTGGCGCAGCTG
+
CCCFFFFFHHHHHJJJJJJIIJIEIBEHBBBBBDDD<CDDDDDDDCCDDDDDD?ABBBDDDDDDDBBDD@DDDDDDD<BBDDDDDDDBB<CBABD@BDDC
@HWI-ST885:197:HA2RPADXX:1:1101:1438:2386
CCTGTTTCTCGCCTGGTGAGAGTCGCAGACACTTCAGGAGACCACGTCCACCACCACCACCATTGCCTCCTCCTTCAGAACCTACTACTTTTGGCCCAGA
+
BBBFFFFFHHGHHIJJGHHHHIGHIIIIJJJJJJJJJJIJIJFJJGHIIHJGHJJEHHHFFFFEEEEEEDDDDDDDDDDDDDDDDDDDDDEDDDDDDBDD
@HWI-ST885:197:HA2RPADXX:1:1101:1310:2396
CAGGCGGGTTCACGTTTGGCACCGCAAAGACGGCGACAACCACCCCTGCCACCGGCTTTTCTTTCTCCTCATCCAGCGCTGGTGGGTTTAGTTTCGGGAC

See how it goes away? Bollocks! This happens after running python ~/Scripts/q-trim.py.

Let's see if this strips it.
fastq_quality_filter -i Arjam_MOE_R1.fq -o Arjam_MOE_R1_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 15149790 reads.
Output: 14691119 reads.

head Arjam_MOE_R1_fastxtrimmed.fastq 
@HWI-ST885:197:HA2RPADXX:1:1101:1464:2292 1:N:0:ACAGTG
GCCCTTGCCGCCAGCTGGCGCGGCCACGCAGCGGCTGAGCGAGTCTAGGCGCGCGCTGTACTCGGTGAACTTCTTTTGGTAGCGCAGAGCAGTCATTTGC
+
CCCFFFFFHHHHHJJJIJJJJJJJJJJJJJJIHHFDDCDDDDDDDDDDDCBBDBBBDDDDDEDDD@DDDDDDDDDDDDDACCDDDDDDDDDDDDDDEEED
@HWI-ST885:197:HA2RPADXX:1:1101:1438:2386 1:N:0:ACAGTG
GGGTGGAGCAAATTCTGGGCCAAAAGTAGTAGGTTCTGAAGGAGGAGGCAATGGTGGTGGTGGTGGACGTGGTCTCCTGAAGTGTCTGCGACTCTCACCA
+
CCCFFFFFDHHHHJGGHGIGIJJJHIFHIGIIJJJIIIIGHGGHI?FGHIGIIJ8@=5BEHHH9BEFEDCDDBDDDDDDCCD>@CCDDDDBDBBDDCCBC
@HWI-ST885:197:HA2RPADXX:1:1101:1310:2396 1:N:0:ACAGTG
CTTGGGTCGTGAGTGAGAACAGGCTGGTAGACGGGGCGCTCGCCGAAGGCTGGGATGAAGTCCCGAAACTAAACCCACCAGCGCTGGATGAGGAGAAAGA

Cool! Looks like it kept the extended header on.

fastq_quality_filter -i Arjam_MOE_R2.fq -o Arjam_MOE_R2_fastxtrimmed.fastq -q 20 -p 80 -Q 33 -v
Quality cut-off: 20
Minimum percentage: 80
Input: 15149790 reads.
Output: 14464395 reads.
discarded 685395 (4%) low-quality reads.

For once something worked. Oh yeah, just as a side note if you have illumina reads you must put -Q 33 parameter. This of course, is an undocumented fix discovered by users. Three+ years after this discovery, -Q remains to be documented in the fastx-toolkit. 

I still am going to do the pair matching before running trinity again.

~/Scripts/both.py Arjam_MOE_R1_fastxtrimmed.fastq Arjam_MOE_R2_fastxtrimmed.fastq 

Trinity.pl --seqType fq --left Arjam_MOE_R1_fastxtrimmed.fastq.both --right Arjam_MOE_R2_fastxtrimmed.fastq.both --CPU 4 --JM 20G --output output_4

----------------------------------------------------------------

Yay! I fixed the /1 /2 problem. You can see that the fasta files trinity creates there is now the /1 and /2. Now we will see if we can get past chrysalis and if the other error with the c++libs is fixed.

head left.fa
>HWI-ST885:197:HA2RPADXX:1:1101:1464:2292/1
GCCCTTGCCGCCAGCTGGCGCGGCCACGCAGCGGCTGAGCGAGTCTAGGCGCGCGCTGTACTCGGTGAACTTCTTTTGGTAGCGCAGAGCAGTCATTTGC
>HWI-ST885:197:HA2RPADXX:1:1101:1438:2386/1
GGGTGGAGCAAATTCTGGGCCAAAAGTAGTAGGTTCTGAAGGAGGAGGCAATGGTGGTGGTGGTGGACGTGGTCTCCTGAAGTGTCTGCGACTCTCACCA
>HWI-ST885:197:HA2RPADXX:1:1101:1310:2396/1
CTTGGGTCGTGAGTGAGAACAGGCTGGTAGACGGGGCGCTCGCCGAAGGCTGGGATGAAGTCCCGAAACTAAACCCACCAGCGCTGGATGAGGAGAAAGA
>HWI-ST885:197:HA2RPADXX:1:1101:1346:2468/1
ATGGAGCCCAGGCCTCCAGTGCAGAGTGAGTGCTTCCTTCCATGGTCCCCATGCCATCGTGTGACAAGTTCTGTGACCTGATTTCCAGCACTGTCATCCA
>HWI-ST885:197:HA2RPADXX:1:1101:1467:2481/1
GGCCCAGGGAGTCCTCCACCACCCCCTCCCCTTTCCTGGCCTGCTCTCTAATTCTCTAGAAACCTTCCTGTGTATCCTGCCTACTTAAACCCTGCATCCC
head right.fa
>HWI-ST885:197:HA2RPADXX:1:1101:1464:2292/2
CAACGGCGCGCTGCAGGCCCGGCTCGCCGCGCTGCACAAGGCTTTCAAGAAGGAGGCTCTGCGCGCAGGCAAGCGCGAGAAGTCGCTGGTGGCGCAGCTG
>HWI-ST885:197:HA2RPADXX:1:1101:1438:2386/2
CCTGTTTCTCGCCTGGTGAGAGTCGCAGACACTTCAGGAGACCACGTCCACCACCACCACCATTGCCTCCTCCTTCAGAACCTACTACTTTTGGCCCAGA
>HWI-ST885:197:HA2RPADXX:1:1101:1310:2396/2
CAGGCGGGTTCACGTTTGGCACCGCAAAGACGGCGACAACCACCCCTGCCACCGGCTTTTCTTTCTCCTCATCCAGCGCTGGTGGGTTTAGTTTCGGGAC
>HWI-ST885:197:HA2RPADXX:1:1101:1346:2468/2
GGGGGTGGAAATTTTGAGACAAATGTTGATTCTTGGTGAGTTGATGAGTCTTTTCTAGACACATAGAGAAGGTGCTGAAGATTGAGAGAAAACGCCTTCC
>HWI-ST885:197:HA2RPADXX:1:1101:1467:2481/2

GTCAAACTCGTTGTAGCAGATTCTACTTGGGATGCAGGGTTTAAGTAGGCAGGATACACAGGAAGGTTTCTAGAGAATTAGAGAGCAGGCCAGGAAAGGG

To be continued.

No comments:

Post a Comment