Because I did such a horrid job taking comprehensible notes while in London, I am going to be more efficient this time around with transcriptome assembly.
We have our illumina RNA-seq reads back (yay!).
To download the reads from UArizona, you have to download this horridly complicated program called irods (http://uagc.arl.arizona.edu/resources/user-guide). After much struggle, the easiest way to get things installed and compiled is the "non-fancy" way. HOWEVER, you MUST download the most recent version of iRods (as of right now 3.3.1)
Navigate to your iRODs folder. Things in blue are what you type. Things in pink are what you would change differently based on your computer.
Take the rate at node 75 for sig1 1 prior: -0.86245
Pick random branch within this clade for sig2 prior (Pellorneum_pyrrogenys):1.1092
Alpha: random uniform distribution runif(1)
rand.shift = 0.05 --used what Revell, et al uses
To create a new job:
Click "Create New"
Click "de Novo Assembly" and "Data Prep" then click "Next"
The data for C. sowelli is already loaded as a SMRT cell. Start a new job by naming the job, selecting the protocol, and choose "C. sowelli_amp" and move it from SMRT Cells available to "SMRT Cells in Job". Click "save" and "start"at the bottom right-hand corner.
To view the failed jobs, click on the job. On the right hand side, click "View Log".
I have tried RS_HGAP Assembly 2 for quality, RS_HGAP 3 for speed, and a Long Amplicon assembly and all three have failed and it is not clear why because it seems to start out filtering with no problem.
 Find missing data for high school student
 Figure out how to teach RaxML to a high school student (JW slides) COMPLETE
 Get Gazey-Staley algorithm set up in R and running. COMPLETE
-Okay, so we don't actually have to do this step, as its just a measure of cloning success (and since we didn't clone. We don't really need to measure this. What we do need to measure (but not right away, is the # of sequences (reads) v. # genes (unique contigs). This will tell us if we have sampled enough of the genome for the OR genes given our degenerate primers.
Something to still think about regarding my conservation manuscript:
Higher fragmentation of widely distributed species
compared to those that are more narrowly distributed
"I agree with this result. In a more general
biogeographic sense, isn’t this expected given that a larger area has a greater
probability of being fragmented than a smaller one (in the same way that a
short sequence has lower probability of acquiring a mutation than a long one)?
Did you scale for that somehow (have no idea how, but we can talk about it)?"
We compared proportions of distributions covered by high HII. Not sure if this counts as scaling...
Because most of the reads are so long and ORs are technically around 900bp, Geneious could essentially align the sequences from F and R primers and there is a good chance that we will get assembled OR sequences (which we can because my adviser and I did it yesterday).
In a way this makes more sense since the goal of most Assembly programs are to assemble a genome, making a scaffold. However, we have subgenome sequences that are scattered throughout the genome. Basically, it is assembly many genes at once but doesn't need to put them in one big scaffold string.
However, I am still more comfortable with using Celera for two reasons:
1) Sequences that are too repetitive are removed. Celera is an algorithm separates the gene sequences into either unique "unitigs" to be set as seed scaffold. The repetitive or non-unique "unitigs" are saved for later to try to overlap on the seed scaffold. Because we don't really care about the scaffold, the output from Celera that we do care about is actually the degenerate unitigs, which are unique contigs (more than one read) that do not fit into a scaffold, highlighting that they are unique genes. After this, I plan to filter the degenerate unitigs by length. Then, things should be ready for the ORA pipeline.
2) You can easily control the error-correction parameters.
I took out the "stopAfter=overlapper"...because that interrupted. Not sure why we would want to do that.
 Write syllabus for high school student COMPLETE [7/2/2014]
 Look at Trpc2 data big Mac broken :[
 Enter grades into blackboard COMPLETE
 submit conservation paper COMPLETE [7/2/1/2014]