Material web page.
This is necessary in order to reproduce the code shown in the training material. The material is designed for R 4.1
and Bioconductor 3.13
and can be installed using one of the two ways below.
If you’re familiar with Docker you could use the Docker image which has all the software pre-configured to the correct versions.
docker run -e PASSWORD=abc -p 8787:8787 mblue9/rnaseq-r-tidyverse:latest
Once running, navigate to http://localhost:8787/ and then login with Username:rstudio
and Password:abc
.
You should see the Rmarkdown file with the training material code which you can run.
Alternatively, you could install the training material using the commands below in R 4.0
.
# Install training material package
remotes::install_github("mblue9/RNAseq-R-tidyverse", build_vignettes = TRUE)
# To view vignettes
library(RNAseqRtidyverse)
vignette("RNAseqRtidyverse")
To run the code, you could then copy and paste the code from the workshop vignette or R markdown file into a new R Markdown file on your computer.
This training material presents how to perform analysis of RNA sequencing data following the tidy data paradigm. The tidy data paradigm provides a standard way to organise data values within a dataset, where each variable is a column, each observation is a row, and data is manipulated using an easy-to-understand vocabulary. Most importantly, the data structure remains consistent across manipulation and analysis functions.
This can be achieved for RNA sequencing data with the tidybulk, tidyHeatmap and tidyverse packages. The tidybulk package provides a tidy data structure and a modular framework for bulk transcriptional analyses and tidyHeatmap provides a tidy implementation of ComplexHeatmap. These packages are part of the tidytranscriptomics suite that introduces a tidy approach to RNA sequencing data.
Recommended Background Reading Introduction to R for Biologists
In exploring and analysing RNA sequencing data, there are a number of key concepts, such as filtering, scaling, dimensionality reduction, hypothesis testing, clustering and visualisation, that need to be understood. These concepts can be intuitively explained to new users, however, (i) the use of a heterogeneous vocabulary and jargon by methodologies/algorithms/packages, (ii) the complexity of data wrangling, and (iii) the coding burden, impede effective learning of the statistics and biology underlying an informed RNA sequencing analysis.
The tidytranscriptomics approach to RNA sequencing data analysis abstracts out the coding-related complexity and provides tools that use an intuitive and jargon-free vocabulary, enabling focus on the statistical and biological challenges.