DESeq2
Salmon, tximport and DESeq2 flow was authored by the same person, Michael Love. I think I could never understand what he did.
Last updated
Salmon, tximport and DESeq2 flow was authored by the same person, Michael Love. I think I could never understand what he did.
Last updated
Simply speaking DESeq2 is an open source package that calculate the differential gene expression between 2 groups. It includes some of the popular visualization method that one could generate with a single line command. Statistic and technical details are available here, where most of the content of this page was extracted from.
First of all to straight things out we can rearrange the folder structure - that is to group all the .sf
files into the working directory of the current R session. This is not necessary computer-wise but I think it helps the beginner to get on their feet faster and better.
A data frame
that contains the necessary information for DESeq2 to understand has to be customized every time for each analysis. The minimum information would be sample name with the matching comparison group tag, in a specific order. Since in the final steps DESeq2 does not take the file paths or names into account, all it does is to index the quantification matrix in alphabetical order and process according to the comparison group, or experiment design
in its own language, listed in the colData data frame, one has to make sure the colData is logged in the exact order as how the file arranges itself.
This sampleTable is essentially the colData
that we will need for DESeq2 analysis later.
As mentioned before, Salmon works on mRNA referencing against cDNA and therefore the .sf file that generated contains code name ENST, meaning the transcriptome in the Ensembl coding system. We need to convert the TPM list of ENST into ENSG, codes in the gene format, and this is where tximport
package kicks in.
Finally we can put the txi
into DESeq2 to get the log2foldchange
of detected genes and their p-values.
The resultsNames(dds) command would call out the log2foldchange result stored inside the dds, and it should look like this:
This means "for everything inside ALS as 1, how does CTRL change?" And this is not exactly what we want. We need the result to be in the reversed order. This unwanted behavior was stemmed from we didn't specific the level
of the comparison group. The default level is always arranged in the alphabetical order if we do not assign it ourselves. In order to relevel the groups
We can finally start to analyse and visualize the data. But first of all what is log2foldchange?
Imagine you are calculating fold-change by measuring counts ratios, so:
FoldChange = counts( Treatment ) / Counts ( Control )
Here, FoldChange values larger than one means the gene is more expressed in Treatment, a FoldChange of exactly one means no difference, and FoldChange values between zero and one means the gene is more expressed in Control. Now, if you take the log base 2 of the FoldChange values, you will have:
if FoldChange > 1, log2( FoldChange ) > 0
if FoldChange = 1, log2( FoldChange ) = 0
if 0 < FoldChange < 1, log2( FoldChange ) < 0
So positive values means the gene is more expressed in Treatment, and negative values means the gene is more expressed in Control.