Showing posts with label scripts. Show all posts
Showing posts with label scripts. Show all posts

Tuesday, September 16, 2014

#remove newlines
awk '{if (substr($0,1,1)==">"){print "\n"$0} else printf("%s",$0);p++;}END{print "\n"}' DR_091_Mobl_MOE_assembled.fasta > joined.fasta

If I try running BLASTALL with my transcriptome data as the database, I get this error:

[blastall 2.2.22] ERROR: SeqPortNew: lcl

All the names of the sequences are too similar so I have to rename them--I am just going to rename them by number.
awk '/^>/{print ">" ++i; next}{print}' < DR_091_Mobl_MOE_assembled.fasta > renamed_DR_091_Mobl_MOE_assembled.fasta

#export the path the blast
export PATH=$PATH:/Users/laurelyohe/blast-2.2.22/bin/

#format the database 
#because the transcriptome is bigger than the indices, it will serve as the database
formatdb -i renamed_DR_091_Mobl_MOE_assembled.fasta -o T -p F
#run blast
blastall -p tblastx -d renamed_DR_013_Mored_MOE_assembled.fasta -i OR_VR_indices.fasta -e 1e-06 -v 5 -b 5 -a 2 -o ./DR_013_Mored_MOE_assembled_v_OR_VR_indices.tblastx

#now we need to parse the BLAST script
perl ~/Scripts/parse_bls.pl --i DR_091_Mobl_MOE_assembled_v_OR_VR_indices.tblastx > DR_091_Mobl_MOE_assembled_v_OR_VR_indices.tblastx.parsed
#next combine the parsed file with the sequences
~/Scripts/extract_hsps_new.pl -i renamed_DR_091_Mobl_MOE_assembled.fasta -t DR_091_Mobl_MOE_assembled_v_OR_VR_indices.tblastx.parsed -s DR_091_Mormoops_blainvelli

Tuesday, August 12, 2014

Here is a blog that gives excellent Trinity instructions:
https://cartwrightlab.wikispaces.com/RNAseq+Assembly+in+Trinity

#for paired end reads, combine each side of pairs into one file
cat pair.DR004_Artibeus_Jamaicensis_MOE_ACAGTG_L001_R1* >> Arjam_MOE_R1.fq
cat pair.DR004_Artibeus_Jamaicensis_MOE_ACAGTG_L001_R2* >> Arjam_MOE_R2.fq

#run a trimming script
python ~/Scripts/q-trim.py Arjam_MOE_R1.fq Arjam_MOE_R1_trim.fq

#woops I run into this error:
importerror no module named screed

#it looks like I need the "screed" module for this; looks like I also need to update "setup tools"
#first install pip
sudo python get-pip.py
pip install ipython #not sure what this did
#still missing the screed module
sudo pip install setuptools --upgrade
sudo pip install git+https://github.com/ged-lab/screed.git

#okay seems to be working now
#trimmed with the default setting in which a minimum Qscore = 5
#minimum length = 30; lets see how this works
python ~/Scripts/q-trim.py Arjam_MOE_R1.fq Arjam_MOE_R1_trim.fq
python ~/Scripts/q-trim.py Arjam_MOE_R2.fq Arjam_MOE_R2_trim.fq

#now i need to assemble paired reads--extracting only the pairs using both.py
python ~/Scripts/both.py Arjam_MOE_R1_trim.fq Arjam_MOE_R2_trim.fq
#this creates two files ...R1_trim.fq.both and R2_trim.fq.both
#this script took about one hour to run

#finally time to run trinity!
Trinity.pl --seqType fq --left Arjam_MOE_R1_trim.fq.both --right Arjam_MOE_R2_trim.fq.both --CPU 4 --JM 20G --output trinity_output/

Everything is up and running now. Stay tuned.