🎒
NGS for natural scientist
  • 1. Preface
    • How to use this book
    • Motivation
    • Genomic data science as a tool to biologist
    • Next Generation Science (also NGS)
  • 2. Getting started
    • A step by step pipeline tutorial
    • Sequencing chemistry explained by Illumina
    • Joining a course
    • RNA quality and Library prep
    • (optional) My click moment about "Why Linux"
  • 3. Good-to-know beforehand
    • Experiment design
    • Single-end and Paired-end
    • Read per sample and data size
    • Normalization - RPKM/FPKM/TPM
    • Gene annotation
  • 4. Setting up terminal
    • My Linux terminal
    • Linux environment
    • R and RStudio
    • PATH
  • 5. FASTQ and quality control
    • Getting FASTQ files from online database
    • FASTQ quality assessment
  • 6. Mapping/alignment and quantification
    • Salmon
    • DESeq2
  • 7. Visualization
  • 8. Single cell RNA-Seq
  • 9. AWS cloud and Machine Learning
    • Machine Learning in a nutshell
    • R vs Python
    • Setting up ML terminal
    • Data exploration
  • (pending material)
    • graphPad
    • readings for ML
Powered by GitBook
On this page
  1. 4. Setting up terminal

PATH

I fell in love with Linux after this little incident so my opinion is that I had a crush - setting up Linux is not as difficult as in Mac but once it is done you learn something important

PreviousR and RStudioNext5. FASTQ and quality control

Last updated 2 years ago

In the case you are not working on your own data, SRA tool kit is a set of executable in the Linux Terminal to download (fetch) the sequencing files from , the largest NGS raw data database. First of all, it is not totally necessary to install this particular tool kit because there are separate packages that perform the exact same tasks. But in case you opt for this, which I do, you can refer to these links

Assigning the PATH to point to the SRA-Toolkit location is the first thing I learnt how to work with Linux. Humanly speaking, when you type in a program name and try to run it in the terminal, you need to first let the system know where to look for the executable files. For example, I need to prefetch the sequencing raw data named SRA0000001. The prefetch executable is located at /home/user/sratoolkit/bin. So I need to type in

prefetch SRA0000001

But then what you might get is an error message of "what is prefetch?" because the terminal will never search for this prefetch guy (the executable) in /home/user/sratoolkit/bin until you told it to do so. The way to assign this directory in the future search is to do

export PATH=$PATH:/home/user/sratoolkit/bin

It looks easy right? The catch is when you ended this session by restarting the terminal, checking the PATH environment by echo $PATH, chances are the /home/user/sratoolkit/bin line is gone.

GONE.

I then head to my best friend, Google.

There are several text editors in Linux, the one that I liked is nano, but suit yourself with anyone of your liking. Just open this ~/.bashrc file and add the line export PATH="/home/user/sratoolkit/bin:$PATH" AT THE END OF THE FILE and then REOPEN your terminal.

Don't ask me why at the end of the file, why this file, why reopen, [del]and why do I bother to live[/del]. Cause I am more than exhausted by the time I found my solution.

SRA/GEO
https://github.com/ncbi/sra-tools/wiki/02.-Installing-SRA-Toolkit
https://www.ncbi.nlm.nih.gov/sra/docs/sradownload/
https://askubuntu.com/questions/1195236/how-to-permanently-change-a-path-variable-in-debian