🎒
NGS for natural scientist
  • 1. Preface
    • How to use this book
    • Motivation
    • Genomic data science as a tool to biologist
    • Next Generation Science (also NGS)
  • 2. Getting started
    • A step by step pipeline tutorial
    • Sequencing chemistry explained by Illumina
    • Joining a course
    • RNA quality and Library prep
    • (optional) My click moment about "Why Linux"
  • 3. Good-to-know beforehand
    • Experiment design
    • Single-end and Paired-end
    • Read per sample and data size
    • Normalization - RPKM/FPKM/TPM
    • Gene annotation
  • 4. Setting up terminal
    • My Linux terminal
    • Linux environment
    • R and RStudio
    • PATH
  • 5. FASTQ and quality control
    • Getting FASTQ files from online database
    • FASTQ quality assessment
  • 6. Mapping/alignment and quantification
    • Salmon
    • DESeq2
  • 7. Visualization
  • 8. Single cell RNA-Seq
  • 9. AWS cloud and Machine Learning
    • Machine Learning in a nutshell
    • R vs Python
    • Setting up ML terminal
    • Data exploration
  • (pending material)
    • graphPad
    • readings for ML
Powered by GitBook
On this page
  1. 9. AWS cloud and Machine Learning

R vs Python

So for some reason I am learning more than I planned to.

PreviousMachine Learning in a nutshellNextSetting up ML terminal

Last updated 1 year ago

Python has a wider appearance in general data science such as mahine learning, while R supports RNA-seq better than Python in terms of package (R) or library (Python). That is why at the end of the day I need to acquire both syntax. I have talked about on your own, and I am going to show you how do I see R and Python in a programming sense.

Before we even started to paste code chunks around the place, to sum it up I do think Python makes my life easier in plotting graphs.

//pheatmap in R
df <- as.data.frame(colData(dds)[,"group"])
select <- order(rowMeans(counts(dds, normalized=T)), decreasing=TRUE)[1:200]
pheatmap(assay(vsd)[select,], 
         color=colorRampPalette(c("navy", "white", "red"))(100),
         cluster_rows = T, 
         show_rownames = F, 
         show_colnames = T, 
         cluster_cols = F, 
         labels_col = paste0(sampleTable$sample, sampleTable$group), 
         border_color = NA)
//Boxplot in Python
plt.figure(figsize = (20,6))
sns.boxplot(data = df_metadata, x = 'species', y = "top_frequency")
plt.xticks(rotation = 90)
plt.show()

They are totally 2 different graphs, but what I want to highlight is how much more intitutive to plot a graph in Python than R. In R, you need to specify all the parameters in the plotting function in one go, but in Python you can execute them one by one. Of course you can copy the whole function and change the parameter as you run the plot in R, but Python is undoubtedly to win in the readibility section.

The anonymous function in R and Python works quite similar, but they look different, with R to be a bit daunting to me.

1 %>% {.+1}

lambda x : x + 1

And because of many other reasons, Python is easier to learn than R which explained the popularity. Thus I would recommend to start with R if you want to know both because that would ease up your Python learning journey for a bit.

It is easier to pick up Python if you know R beforehand.

Familiar with Linux command line is also an advantage in RNA-seq and datasciecne in general.

how much you need to know about R before you can do RNA-Seq