Sunday, September 16, 2012

Most of the delay in the progress of the intron repeat sorting program was delayed due to trying to set up a new IDE on a new machine. After installing a GCC compiler, it was a nightmare trying to get a program to build in Eclipse. Days were spent setting up environmental variables. Finally, all hope was abandoned and I tried Code::Blocks. The initial RepeatScout program (at least the part I am going to be working on) built and ran on the first try. SIGH. Finally could get the ball rolling.

Yesterday and today were by far the most productive. The initial problem I wanted to tackle was the user input. Instead of confusing parameters to try to follow, the program prompts the user to enter in the .fas file location in a very friendly manner.The program now currently reads input in from FASTA sequences and locates repetitive elements via the l-mer algorithm mentioned below. My intention is to score the introns based on the degree of repetitiveness. This score will be based on the number of times an l-mer repeats and then sort the sequences based on their scores. The output will be a sorted .fas file.

To do:
  • Determine whether program is reading file as individual sequences or as one giant genomic sequences. It should ideally be the first, but I need to double check. First priority.
  • Write user prompt to enter in the name of the output file for the sorted fasta sequences to be placed.
  • Put in counter to detect number of times a repetitive element is found in a sequence. This will be the score.
  • Sort list with the score.
  • Write output to .fas
If worked on full-time, it could probably be finished in two days. 

Also need to type up minutes from Skype meeting with Sushma and Liliana from Thursday. Tomorrow....

No comments:

Post a Comment