Monday, June 30, 2014

Getting data assembled now. Not using Sprai--going to try to use Celera now (which I have already installed).

After several hours of downloading a million things, it turns out there a bug in svn code, but not if you download the .tar.gz file directly from sourceforge. For some reason, kmer will not compile (it must not be updated) if you try to svn into it.

Anyways, here is what I got to work:
Visit: http://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-8.1/
bzip2 -dc wgs-8.1.tar.bz2 | tar -xf - #don't compile Celera yet
cd kmer
gmake install
#oops don't have gmake; just make it the same as "make"
sudo ln -s /usr/bin/make /usr/bin/gmake
cd ..
cd samtools
make
cd ..
cd src
gmake

#i tried getting Figaro and UMDOverlapper to work but I don't want to mess
#things up; let's try this for now

In the README, it says you can run the assembler with:
  wgs-8.1/*/bin/runCA
#in my case, the * is Darwin-i386

The sequences are now kept in the spare Drive
cd Volumes/Spare/pacbio/C_sowelli

gunzip filtered_subreads.fast*
gunzip reads_of_insert.fast*

#make the FRG wrap file to be inputted
~/wgs-8.1/Darwin-i386/bin/fastqToCA -libraryname GPC -technology pacbio-raw -reads reads_of_insert.fastq >GPC_untrimmed.frg
#make the .spec file--for this first run:
#saved as GPC_spec.spec
merSize               = 17
merThreshold          = 0
merDistinct           = 0.9995
merTotal              = 0.995

doOBT                 = 0
doExtendClearRanges   = 0

unitigger             = bogart

ovlErrorRate          = 0.05  #  Compute overlaps up to 5% error
utgGraphErrorRate     = 0.05  #  Unitigs at 5% error
utgMergeErrorRate     = 0.05  #  Unitigs at 5% error
cnsErrorRate          = 0.05  #  Needed to allow ovlErrorRate=0.05
cgwErrorRate          = 0.05  #  Needed to allow ovlErrorRate=0.05

ovlConcurrency        = 16
cnsConcurrency        = 16

ovlThreads            = 1
ovlHashBits           = 22
ovlHashBlockLength    = 10000000
ovlRefBlockSize       = 25000

#cnsReduceUnitigs      = 0 0   #  Always use only uncontained reads for consensus
cnsReuseUnitigs       = 1     #  With no mates, no need to redo consensus

cnsMinFrags           = 1000
cnsPartitions         = 256

#run the assembler
~/wgs-8.1/Darwin-i386/bin/runCA -d Volumes/Spare/pacbio/C_sowelli/Assembly/GPC-trim -p GPC-trim -s /Volumes/Spare/pacbio/C_sowelli/Assembly/GPC_spec.spec GPC_untrimmed.frg

Trying with 5% error rates since the sequences are so similar.




No comments:

Post a Comment