Thursday, May 26, 2016

How to blast against a transcriptome

A guest post by Ramatu Abubakar.
  • Log into the server and make your own folder.
  • Import the fasta file you want to blast against the transcriptome. For example, "Mc1r_Fasta.txt"
Creating your own database::
  • Under your folder, make a single folder for all your transcriptome fasta files. 
  • Make a new folder and move one transcriptome file under that folder. For example my folder was called "PE111_Cabr_VNO_trinity_output" and my transcriptome file was "PE111_Cabr_VNO_trinity_output.Trinity.fasta"

Command for creating database:
  • cd into the folder containing the transcriptome file you want to make the database for.
Type in the following:

makeblastdb  -in (transcriptome file name) -title (name of the folder contain the transcriptome)           -dbtype (prot for a database of proteins and nucl for a database of DNA or RNA) -out (name of your output).

What it means:
  • Makeblastdb tells the blast program to create a database.
  • in represents the input file.
  • title represents the title for the blast database to be created.
  • dbtype tells the blast program whether it is a protein or nucleotide sequence.
  • out represents the name of each database created. You can call it anything you want.
Example below:

105-238:PE111_Cabr_VNO_trinity_output grads$ makeblastdb -in PE111_Cabr_VNO_trinity_output.Trinity.fasta -title PE111_Cabr_VNO_trinity_output -dbtype nucl -out PE111_Cabr_VNO_trinity_output.Trinity.aa


Building a new DB, current time: 05/26/2016 13:18:13
New DB name:   /Users/grads/Ramatu_new_transcriptome/PE111_Cabr_VNO_trinity_output/PE111_Cabr_VNO_trinity_output.Trinity.aa
New DB title:  PE111_Cabr_VNO_trinity_output
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 90410 sequences in 5.0665 seconds.

You should get results in your folder (make sure to refresh).

"nhr is the header file, nin is the index file and nsq is the sequence file. You dont really have to know this. Blast 'just' needs this"

Blasting against the transcriptome database:
  • cd back into your main folder
Type in the following:
blastn (use blast for nucleotide sequence and blastp for protein sequence) -query (fast file you want to search in the transcriptome) -db (database name created) -out (anything you want your output file to be called)

Building a new DB, current time: 05/26/2016 13:18:13
New DB name:   /Users/grads/Ramatu_new_transcriptome/PE111_Cabr_VNO_trinity_output/PE111_Cabr_VNO_trinity_output.Trinity.aa
New DB title:  PE111_Cabr_VNO_trinity_output
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 90410 sequences in 5.0665 seconds.
105-238:PE111_Cabr_VNO_trinity_output grads$ cd
105-238:~ grads$ cd Ramatu_new_transcriptome
105-238:Ramatu_new_transcriptome grads$ ls
Mc1r_Fasta.txt PE111_Cabr_VNO_trinity_output new_transcriptomes
105-238:Ramatu_new_transcriptome grads$ blastn -query Mc1r_Fasta.txt -db /Users/grads/Ramatu_new_transcriptome/PE111_Cabr_VNO_trinity_output/PE111_Cabr_VNO_trinity_output.Trinity.aa -out Mc1r_blastn_1.txt
105-238:Ramatu_new_transcriptome grads$ 

If done correctly, you should get a new result file. (Make sure to refresh)