Laurel's Lab Notebook: 2012

Tuesday, November 20, 2012

Questions guiding this niche-modeling project:
-Are the predicted niches distinct? Do all subspecies show the same niche?
-Are new data showing interesting observations/alternative hypotheses?
-Does land use data show something interesting?
-What about mapping the tree? Is this interesting?

Thursday, November 1, 2012

Finished georeferencing and data entry from museums and my field data using mostly Google Earth searches and the gazetteer. Still to do is enter elevation for SR's coordinates. Also, need to parse and georeference the bird field reports sent to me from CBD and Morten.

Questions:
-How many samples are enough for the present data?
-How do you deal with scale issues? Some species are very specific/exact locations while some are only in a given province...
-If there is more than one bird noted at a location (eg. 20 birds of this species noted at this location at this time, do you record all 20 birds as 20 separate samples?

Sunday, September 16, 2012

Most of the delay in the progress of the intron repeat sorting program was delayed due to trying to set up a new IDE on a new machine. After installing a GCC compiler, it was a nightmare trying to get a program to build in Eclipse. Days were spent setting up environmental variables. Finally, all hope was abandoned and I tried Code::Blocks. The initial RepeatScout program (at least the part I am going to be working on) built and ran on the first try. SIGH. Finally could get the ball rolling.

Yesterday and today were by far the most productive. The initial problem I wanted to tackle was the user input. Instead of confusing parameters to try to follow, the program prompts the user to enter in the .fas file location in a very friendly manner.The program now currently reads input in from FASTA sequences and locates repetitive elements via the l-mer algorithm mentioned below. My intention is to score the introns based on the degree of repetitiveness. This score will be based on the number of times an l-mer repeats and then sort the sequences based on their scores. The output will be a sorted .fas file.

To do:

Determine whether program is reading file as individual sequences or as one giant genomic sequences. It should ideally be the first, but I need to double check. First priority.
Write user prompt to enter in the name of the output file for the sorted fasta sequences to be placed.
Put in counter to detect number of times a repetitive element is found in a sequence. This will be the score.
Sort list with the score.
Write output to .fas

If worked on full-time, it could probably be finished in two days.

Also need to type up minutes from Skype meeting with Sushma and Liliana from Thursday. Tomorrow....

Wednesday, September 5, 2012

So RepeatScout is only going to serve as inspiration for a future mini-program. It doesn't do what I want it to and nothing else does either so I am going to use the same lines of thinking and make a simpler program. Also, I hate the command prompt of RepeatScout and every other program I am trying so that is the first thing that's going to change.

Now to clarify my line of thinking...

OBJECTIVE OF PROGRAM: To read a fasta sequence and find repetitive elements and note a degree of repetitiveness (# of repetitive elements? # of l-mers repeated? both? TBD). It will then sort the list of original fasta sequences by the degree of repetitiveness. Output will be a list of sorted fasta sequences (and maybe something else giving more details, we'll see).

Right now, RepeatScout is mostly written in C. I like C so I will probably continue to do so. Remember to add this paper to my Mendeley to remember exact details of RepeatScout algorithm.

Goal for the day: Write the file opener, reader, command prompt. Try to begin to incorporate l-mer finder.

Thursday, August 30, 2012

By limiting primer lengths to be only 15-20bp in Geneious, our list of introns narrowed down from 900 to 134 introns that we were able to generate primers for. Two base pairs makes a huge difference!

Limiting primer lengths to 15-21bp, 348 primers were found of the 900 target sequences. Over 3,000 primers were found.

Decided it made more sense to sift through the introns first to see what is repetitive before actually making the primers. Thinking of how to go about this, I would imagine there are many different solutions. It is looking for a sequence of repetitions within a larger given sequence. Reminds me of n-mer algorithm. Looked through various software available already. Saha, et al. 2008 provided good overview of software and review of what I am actually trying to do. Settled for trying RepeatScout as I could easily install it on the lab Mac and code looks manageable to manipulate

After finally figuring out the parameters, it looks promising. Will pick up where I left off tomorrow.

Wednesday, August 29, 2012

Explored different programs that could design multiplex PCR primers for our sequencing project of transcriptome of Pteronotus parnellii. Have collection of intronic regions from genome. Trying to design primers to make reads of 100-200 bp. This will be a single template PCR reaction, as all reads come from the same organism, jut different parts of the genome.

Important things to remember in multiplex PCR primer design (in addition to general PCR primer design) via PremierBiosoft:

Primer length should be shorter (18-22bp). Do not want to go shorter than 15bp.
Melting temperature between 55-60 degree C.
Highly specific (more important for multiple target multiplex).
Avoid primer-dimers.

Nothing seems too new.

After searching for programs, only few claimed to be able to hand multiplex primer design. Nearly all use the Primer3 algorithm. Nearly all programs were not open source. Here are the attempts:

PerlPrimer: looks promising but I did not have a Unix machine on hand. Download with GUI for Windows would not work at all for mysterious reason. Tried running on Cygwin but many of the tests were failing so I figured I would keep checking.
MPprimer: also looked very promising but only works on Unix machine
jPCR: java app that actually works great, but think it works best for smaller sequences or more similar sequences. I didn't try running on the server so that could help, but I could only get primer designs for up to ~30 sequences before freezing.
Geneious Pro: (sigh) end up using the fancy-schmancy. Although doesn't have multiplex primer option (allegedly coming soon), was able to load 900 sequences as a sequence list and design primers for about 450 sequences and over 4,000 primers. Making target region smaller, primer length shorter, and narrowing GC content helped narrow down, but still need to sift through some of the sequences.

To-Do List:

Write script to sift out intron sequences that are too repetitive (or is there one out there already?)

Today is the birth of my online lab notebook.

Inspiration for such things as an online lab notebook is courtesy of many which will soon be listed in the "About" section of the site. Please note that this blog simply serves as a lab notebook and is nothing more than a substitute of having a paper lab notebook. All writings serve as reports and actions completed for the day, thoughts worth remembering in the future, summaries and conclusions of journals read, and records that can be archived and reviewed when needed and when it comes time for publication. All data and notes written on this site are not presented as final results or a reflection of outcomes of my project. Notes are jotted in the same way that they are made in a paper notebook, and may only be relevant/comprehensible to myself and others involved in the project.

Onwards...