Salmon
How did it end up to be named after a fish I have no idea but I learnt this in Japan - happy coincident
Last updated
How did it end up to be named after a fish I have no idea but I learnt this in Japan - happy coincident
Last updated
There are many commonly used mapping workflow available, and paper are a good starting point in selecting suitable variant. It is painful but true that one has to try many before settling to the most suitable, but considering time, learning curve and the foreseeable continuous support to the related packages, this manual would focus on the method.
Salmon has 2 quantification modes. Practically, without going into technical details, in the first mode Salmon maps the fragment (raw reads stored inside fastq) to an indexed reference genome () and count the hit, then move on to the next. In the second mode a SAM/BAM alignment files were provided to Salmon and Salmon will produce the quantification from the alignment result. One does not need to index reference genome by Salmon before running the quantification for the second mode.
We want Homo_sapiens.GRCh38.cdna.all.fa.gz
and please click to have it downloaded. This is the reference genome.
sudo sh miniconda.sh
The last line means to create an environment called salmon and install package called salmon inside the environment. So every time when you want to fire up Salmon -
The indication that you are in conda
environment is the attachment of bracketed environment name in front of your user name in the terminal, like this
(salmon) user@computer :
Then use this line to index the reference genome, GRCh38.cDNA.fa.gz
, for mapping and quantification and store the indexed files inside cDNA_index
-
salmon index -t GRCh38.cDNA.fa.gz -i cDNA_index
Then we can map and quantify the fastq file using Salmon. Our example is the sequence data from a single-end library so we should use
Replace -r
with -1
and -2
for paired-end read library to specific the paired .fastq file. -r
parameter is for single-end library.
The way to loop through the whole folder and process all files in one go - First of all create the below .sh
file. You can do that with a .txt
in the GUI and then save as .sh
. One can surely do that within terminal using their favorite word processors such as nano
.
Run the file by bash file.sh
, and quit conda by conda deactivate
The most common reference genome database are Ensembl, Refseq (NCBI), and UCSC. I worked exclusively with genome curated by Ensembl so let's start from there. Google "" and you should safely land on the server within first 3 hits. The FASTA file of cDNA of Human is what we are after.
Before installing Salmon, we need to install first to provide the python environment for Salmon.
Refer to for the meaning of the parameter
the file.txt
at the end of line 9 means to input this file for the while
loop to read
, that means the name of the fastq file that the while loop
is reading in are from the file.txt
, which is essentially the SRR_Acc_List.txt
that we generated .