R and RStudio
It is a tool that for all the programming which I have done, I have the most intensive emotion associated with so far - and this is irrelevant.
R and RStudio
make sure you have base R installed before RStudio
https://www.rstudio.com/products/rstudio/download-server/debian-ubuntu/
Installation in Windows are straight forward, but the installation under Linux has to be done in the terminal
R is a statistic oriented language which is famous in the world of RNA-seq and presumably in data science. Learning how to maneuver R is the necessary hoop to jump through before being fluent in NGS and big data analysis. R itself is text-based executable via Terminal and also on Windows, with RStudio as the graphic user interface version of R to aid data processing.
I was born a street-smart but I worked my way to be a mediocre book person, that essentially turned me into a bad skimmer and therefore I found that Datacamp taught me R more than the books I listed below. Not going to lie, that being said, hybrid is the mode I am adopting and I am doing pretty fine. I used to use GraphPad for basic biostatistics and graph generation. And because of the open source nature of R, I switched to ggplot2 and now I love it. The learning curve is not gentle and this idea haunted me for years priori, but the reward is fruitful. Even now I do not regret to have used GraphPad, to have switched to ggplot2, or even if I had not switched, I would have no regrets. That much I am sure, and that much I can tell it doesn't matter.
This is my favorite R book. everything about ggplot2. This is what you should read after you are done with the above. if you want to dive into ggplot2 Then this comes the real deal (sorry)
How much do one need to know R before they can process big data? If you are using published packages, then all you have to know is ggplot, data type and the relevant syntax. One can acquire everything less ggplot in 2 hours from datacamp. But if you are ambitious and would like to build your own statistically model, I do not think you would be reading this book at all.
To have a solid foundation on R's syntax is totally optional in doing RNA-seq, but that could mean an advantage on data science plus your pathway to Python - Python's syntax resembles R's but is so much lighter that I literally was able to code in Python in couple of days from scratch. I have listed out things that mattered to me here.
Dataframe, which is like the feature of R and Python, is essential for data sciecne.
Setting up RStudio for RNA-Seq (either Linux or Windows)
Bioconductor is a topic specific (Genomic data processing) inventory of R package. It provides a library to manage the package downloaded from the site. Do all these in R/RStudio, not in the terminal.
install.packages(“BiocManager”)
Then you can start using BiocManager to download and manage packages that released on bioconductor. In the case of fastqcr as an example, I would like to do the following
BiocManager::install('fastqcr')
After the console is done what it ran, check the library (the default setting is on RStudio, to your right, Packages tab
) if the fastqcr is on the list (you may need to restart RStudio). If not, head back to the console and look for errors. Follow the instruction to install the prerequisite (dependency) and run the same command again to get what you need. This process is true for any other packages that you would install later. Call the package by library(fastqcr)
or simple check the box next to the package name in the packages tab
. For details, refer to the above tutorial written by Alboukadel Kassambara cause you will Google a lot along your way not matter how meticulous I can do in here. Get used to the usual format and be comfortable in getting existing resources.
Last updated