Packages
dplyr
tidyverse package for manipulating data, contains the mutate()
, filter()
, select()
functions and the %>%
operator
ggplot2
tidyverse package for data visualisation
ggrepel
package to position non-overlapping text labels on a ggplot
readr
tidyverse pacakge for reading data into R, contains the read_tsv()
function
tidyr
tidyverse package for data tidying, contains the gather()
and spread()
functions
tidyverse
a collection of packages that work together for data reading, tidying, manipulating and visualising
Functions
c()
combine values (from base R)
case_when()
test multiple conditions (from dplyr). Can use instead of multiple ifelse()
, useful inside mutate()
when creating columns
colnames()
access column names (from base R)
colours()
see the built-in R colours (from base R)
dev.off()
turn off the R graphics device (from base R). Used after e.g. pdf()
, png()
.
dim()
retrieve the dimensions of an object, for example, the number of rows and columns (from base R)
factor()
convert values to factor data type (from base R)
filter()
choose rows (from dplyr)
gather()
function that enables converting from wide to long (tidy) format (from dplyr)
ggplot()
function used to create a ggplot (from ggplot2)
head()
selecting the first part of an object (from base R). Default is to show the first 6 items.
ifelse()
test if a condition is true or not and return a value ifelse(test, yes, no)
(from base R)
labs()
modify title, axis and legend labels on a ggplot (from ggplot2)
levels()
retrieve the levels (category names) of a factor (from base R)
library()
load packages (from base R)
log2()
compute the log2 (base 2) logarithms (from base R)
log10()
compute the log10 (base 10) logarithms (from base R)
mutate()
add columns (from dplyr)
pdf()
create a pdf, used with dev.off()
(from base R)
pull()
extract values e.g. out of a column (from dplyr)
read_tsv()
read a tab-separated file into R (from readr)
select()
choose columns (from dplyr)
str()
showing the structure of an object (from base R). Useful for checking data types.
summary()
producing a summary of an object (from base R). Useful for getting summary statistics of numeric columns (min, max, mean, median)
tail()
selecting the last part of an object (from base R). Default is to show the last 6 items.
View()
invoke a spreadsheet-like viewer on an R object (from base R)
Terms
assignment operator
<-
assigns values to objects, assigns a value on the right to an object on the left (from base R)
character
a data type in R, used to represent character strings, quotes indicate the data type is character
console
a window where you can interactively type in commands and the output is returned
data frame
a data structure in R containing multiple columns and/or rows, can contain different data types
double
a data type in R, used to represent numbers containing a decimal point (integer is the data type for numbers without decimal point)
function
a pre-defined set of commands used to perform a task, can be loaded in from packages or user-created
geom
type of ggplot e.g. geom_bar()
, geom_density()
, geom_boxplot()
, geom_violin()
, geom_point()
, geom_jitter()
, geom_text()
matrix
a data structure in R containing multiple columns and/or rows, all values are the same data type
object
everything in R is an object. The assignment operator <-
can be used to create objects
pipe
%>%
operator chains together tidyverse commands (from dplyr)
scales
scale_colour_brewer(), scale_colour_manual(), scale_x_continuous()
script
a text file containing commands, in R a script filename ends with .R
themes
the non-data components of a ggplot e.g. background, grid lines, font size and font type
tibble
a tidyverse modern version of a data frame, has nicer printing and subsetting
vector
a data structure in R containing a one-dimensional collection of items of the same data type e.g c(“TP53”, “BRCA1”)
Symbols
>
prompt in console, means R is ready to take a command
+
used to add layers to a ggplot. Also the prompt symbol R uses when the command is not complete, such as missing a )
<-
assignment operator, see Terms above
#
comment, to add notes to a script
$
way to access a single column with base R e.g. counts$GENENAME
%in%
operator used to test if a value is in a set of values
==
test if a value is the same as another value
!=
test if a value is not the same as another value
&
and
|
or
%>%
dplyr pipe operator, see Terms above
~
symbol to use when faceting in ggplot2, used to indicate the column to use to facet
LS0tCnRpdGxlOiAiSW50cm9kdWN0aW9uIHRvIFIgY2hlYXRzaGVldCIKYXV0aG9yOiAiTWFyaWEgRG95bGUiCmRhdGU6ICJgciBmb3JtYXQoU3lzLnRpbWUoKSwgJyVkICVCICVZJylgIgpvdXRwdXQ6IAogIGh0bWxfbm90ZWJvb2s6CiAgICB0b2M6IHllcwogICAgdG9jX2Zsb2F0OiB5ZXMKICAgIHRvY19kZXB0aDogNApzdWJ0aXRsZTogZ2xvc3Nhcnkgb2Ygdm9jYWJ1bGFyeSB1c2VkIGluIHRoZSBjb3Vyc2UKLS0tCgojIyBQYWNrYWdlcwoKKipkcGx5cioqICAKdGlkeXZlcnNlIHBhY2thZ2UgZm9yIG1hbmlwdWxhdGluZyBkYXRhLCBjb250YWlucyB0aGUgYG11dGF0ZSgpYCwgYGZpbHRlcigpYCwgYHNlbGVjdCgpYCBmdW5jdGlvbnMgYW5kIHRoZSBgJT4lYCBvcGVyYXRvcgoKKipnZ3Bsb3QyKiogIAp0aWR5dmVyc2UgcGFja2FnZSBmb3IgZGF0YSB2aXN1YWxpc2F0aW9uIAoKKipnZ3JlcGVsKiogIApwYWNrYWdlIHRvIHBvc2l0aW9uIG5vbi1vdmVybGFwcGluZyB0ZXh0IGxhYmVscyBvbiBhIGdncGxvdAoKKipyZWFkcioqICAKdGlkeXZlcnNlIHBhY2FrZ2UgZm9yIHJlYWRpbmcgZGF0YSBpbnRvIFIsIGNvbnRhaW5zIHRoZSBgcmVhZF90c3YoKWAgZnVuY3Rpb24KCioqdGlkeXIqKiAgCnRpZHl2ZXJzZSBwYWNrYWdlIGZvciBkYXRhIHRpZHlpbmcsIGNvbnRhaW5zIHRoZSBgZ2F0aGVyKClgIGFuZCBgc3ByZWFkKClgIGZ1bmN0aW9ucwoKKip0aWR5dmVyc2UqKiAgCmEgY29sbGVjdGlvbiBvZiBwYWNrYWdlcyB0aGF0IHdvcmsgdG9nZXRoZXIgZm9yIGRhdGEgcmVhZGluZywgdGlkeWluZywgbWFuaXB1bGF0aW5nIGFuZCB2aXN1YWxpc2luZwoKClwgCgoKIyMgRnVuY3Rpb25zCgoqKmBjKClgKiogIApjb21iaW5lIHZhbHVlcyAoZnJvbSBiYXNlIFIpICAKICAKKipgY2FzZV93aGVuKClgKiogIAp0ZXN0IG11bHRpcGxlIGNvbmRpdGlvbnMgKGZyb20gZHBseXIpLiBDYW4gdXNlIGluc3RlYWQgb2YgbXVsdGlwbGUgYGlmZWxzZSgpYCwgdXNlZnVsIGluc2lkZSBgbXV0YXRlKClgIHdoZW4gY3JlYXRpbmcgY29sdW1ucwogIAoqKmBjb2xuYW1lcygpYCoqICAKYWNjZXNzIGNvbHVtbiBuYW1lcyAoZnJvbSBiYXNlIFIpCiAgCioqYGNvbG91cnMoKWAqKiAgCnNlZSB0aGUgYnVpbHQtaW4gUiBjb2xvdXJzIChmcm9tIGJhc2UgUikKICAKKipgZGV2Lm9mZigpYCoqICAKdHVybiBvZmYgdGhlIFIgZ3JhcGhpY3MgZGV2aWNlIChmcm9tIGJhc2UgUikuIFVzZWQgYWZ0ZXIgZS5nLiBgcGRmKClgLCBgcG5nKClgLgogIAoqKmBkaW0oKWAqKiAgCnJldHJpZXZlIHRoZSBkaW1lbnNpb25zIG9mIGFuIG9iamVjdCwgZm9yIGV4YW1wbGUsIHRoZSBudW1iZXIgb2Ygcm93cyBhbmQgY29sdW1ucyAoZnJvbSBiYXNlIFIpCiAgCioqYGZhY3RvcigpYCoqICAKY29udmVydCB2YWx1ZXMgdG8gZmFjdG9yIGRhdGEgdHlwZSAoZnJvbSBiYXNlIFIpCgoqKmBmaWx0ZXIoKWAqKiAgCmNob29zZSByb3dzIChmcm9tIGRwbHlyKQoKKipgZ2F0aGVyKClgKiogIApmdW5jdGlvbiB0aGF0IGVuYWJsZXMgY29udmVydGluZyBmcm9tIHdpZGUgdG8gbG9uZyAodGlkeSkgZm9ybWF0IChmcm9tIGRwbHlyKQoKKipgZ2dwbG90KClgKiogIApmdW5jdGlvbiB1c2VkIHRvIGNyZWF0ZSBhIGdncGxvdCAoZnJvbSBnZ3Bsb3QyKQoKKipgaGVhZCgpYCoqICAKc2VsZWN0aW5nIHRoZSBmaXJzdCBwYXJ0IG9mIGFuIG9iamVjdCAoZnJvbSBiYXNlIFIpLiBEZWZhdWx0IGlzIHRvIHNob3cgdGhlIGZpcnN0IDYgaXRlbXMuCgoqKmBpZmVsc2UoKWAqKiAgCnRlc3QgaWYgYSBjb25kaXRpb24gaXMgdHJ1ZSBvciBub3QgYW5kIHJldHVybiBhIHZhbHVlIGBpZmVsc2UodGVzdCwgeWVzLCBubylgIChmcm9tIGJhc2UgUikKCioqYGxhYnMoKWAqKiAgCm1vZGlmeSB0aXRsZSwgYXhpcyBhbmQgbGVnZW5kIGxhYmVscyBvbiBhIGdncGxvdCAoZnJvbSBnZ3Bsb3QyKQoKKipgbGV2ZWxzKClgKiogIApyZXRyaWV2ZSB0aGUgbGV2ZWxzIChjYXRlZ29yeSBuYW1lcykgb2YgYSBmYWN0b3IgKGZyb20gYmFzZSBSKQoKKipgbGlicmFyeSgpYCoqICAKbG9hZCBwYWNrYWdlcyAoZnJvbSBiYXNlIFIpCgoqKmBsb2cyKClgKiogIApjb21wdXRlIHRoZSBsb2cyIChiYXNlIDIpIGxvZ2FyaXRobXMgKGZyb20gYmFzZSBSKQoKKipgbG9nMTAoKWAqKiAgCmNvbXB1dGUgdGhlIGxvZzEwIChiYXNlIDEwKSBsb2dhcml0aG1zIChmcm9tIGJhc2UgUikKCioqYG11dGF0ZSgpYCoqICAKYWRkIGNvbHVtbnMgKGZyb20gZHBseXIpCgoqKmBwZGYoKWAqKiAgCmNyZWF0ZSBhIHBkZiwgdXNlZCB3aXRoIGBkZXYub2ZmKClgIChmcm9tIGJhc2UgUikKCioqYHB1bGwoKWAqKiAgCmV4dHJhY3QgdmFsdWVzIGUuZy4gb3V0IG9mIGEgY29sdW1uIChmcm9tIGRwbHlyKQoKKipgcmVhZF90c3YoKWAqKiAgCnJlYWQgYSB0YWItc2VwYXJhdGVkIGZpbGUgaW50byBSIChmcm9tIHJlYWRyKQoKKipgc2VsZWN0KClgKiogIApjaG9vc2UgY29sdW1ucyAoZnJvbSBkcGx5cikKCioqYHN0cigpYCoqICAKc2hvd2luZyB0aGUgc3RydWN0dXJlIG9mIGFuIG9iamVjdCAoZnJvbSBiYXNlIFIpLiBVc2VmdWwgZm9yIGNoZWNraW5nIGRhdGEgdHlwZXMuCgoqKmBzdW1tYXJ5KClgKiogIApwcm9kdWNpbmcgYSBzdW1tYXJ5IG9mIGFuIG9iamVjdCAoZnJvbSBiYXNlIFIpLiBVc2VmdWwgZm9yIGdldHRpbmcgc3VtbWFyeSBzdGF0aXN0aWNzIG9mIG51bWVyaWMgY29sdW1ucyAobWluLCBtYXgsIG1lYW4sIG1lZGlhbikKCioqYHRhaWwoKWAqKiAgCnNlbGVjdGluZyB0aGUgbGFzdCBwYXJ0IG9mIGFuIG9iamVjdCAoZnJvbSBiYXNlIFIpLiBEZWZhdWx0IGlzIHRvIHNob3cgdGhlIGxhc3QgNiBpdGVtcy4KCioqYFZpZXcoKWAqKiAgCmludm9rZSBhIHNwcmVhZHNoZWV0LWxpa2Ugdmlld2VyIG9uIGFuIFIgb2JqZWN0IChmcm9tIGJhc2UgUikKCgpcIAoKIyMgVGVybXMKCioqYXNzaWdubWVudCBvcGVyYXRvcioqICAKYDwtYCBhc3NpZ25zIHZhbHVlcyB0byBvYmplY3RzLCBhc3NpZ25zIGEgdmFsdWUgb24gdGhlIHJpZ2h0IHRvIGFuIG9iamVjdCBvbiB0aGUgbGVmdCAoZnJvbSBiYXNlIFIpIAoKKipjaGFyYWN0ZXIqKiAgCmEgZGF0YSB0eXBlIGluIFIsIHVzZWQgdG8gcmVwcmVzZW50IGNoYXJhY3RlciBzdHJpbmdzLCBxdW90ZXMgaW5kaWNhdGUgdGhlIGRhdGEgdHlwZSBpcyBjaGFyYWN0ZXIKCioqY29uc29sZSoqICAKYSB3aW5kb3cgd2hlcmUgeW91IGNhbiBpbnRlcmFjdGl2ZWx5IHR5cGUgaW4gY29tbWFuZHMgYW5kIHRoZSBvdXRwdXQgaXMgcmV0dXJuZWQKCioqZGF0YSBmcmFtZSoqICAKYSBkYXRhIHN0cnVjdHVyZSBpbiBSIGNvbnRhaW5pbmcgbXVsdGlwbGUgY29sdW1ucyBhbmQvb3Igcm93cywgY2FuIGNvbnRhaW4gZGlmZmVyZW50IGRhdGEgdHlwZXMKCioqZG91YmxlKiogIAphIGRhdGEgdHlwZSBpbiBSLCB1c2VkIHRvIHJlcHJlc2VudCBudW1iZXJzIGNvbnRhaW5pbmcgYSBkZWNpbWFsIHBvaW50IChpbnRlZ2VyIGlzIHRoZSBkYXRhIHR5cGUgZm9yIG51bWJlcnMgd2l0aG91dCBkZWNpbWFsIHBvaW50KQoKKipmdW5jdGlvbioqICAgCmEgcHJlLWRlZmluZWQgc2V0IG9mIGNvbW1hbmRzIHVzZWQgdG8gcGVyZm9ybSBhIHRhc2ssIGNhbiBiZSBsb2FkZWQgaW4gZnJvbSBwYWNrYWdlcyBvciB1c2VyLWNyZWF0ZWQKCioqZ2VvbSoqICAKdHlwZSBvZiBnZ3Bsb3QgZS5nLiBgZ2VvbV9iYXIoKWAsIGBnZW9tX2RlbnNpdHkoKWAsIGBnZW9tX2JveHBsb3QoKWAsIGBnZW9tX3Zpb2xpbigpYCwgYGdlb21fcG9pbnQoKWAsIGBnZW9tX2ppdHRlcigpYCwgYGdlb21fdGV4dCgpYAoKKiptYXRyaXgqKiAgCmEgZGF0YSBzdHJ1Y3R1cmUgaW4gUiBjb250YWluaW5nIG11bHRpcGxlIGNvbHVtbnMgYW5kL29yIHJvd3MsIGFsbCB2YWx1ZXMgYXJlIHRoZSBzYW1lIGRhdGEgdHlwZQoKKipvYmplY3QqKiAgCmV2ZXJ5dGhpbmcgaW4gUiBpcyBhbiBvYmplY3QuIFRoZSBhc3NpZ25tZW50IG9wZXJhdG9yIGA8LWAgY2FuIGJlIHVzZWQgdG8gY3JlYXRlIG9iamVjdHMKCioqcGlwZSoqICAKYCU+JWAgb3BlcmF0b3IgY2hhaW5zIHRvZ2V0aGVyIHRpZHl2ZXJzZSBjb21tYW5kcyAoZnJvbSBkcGx5cikKCioqc2NhbGVzKiogIApzY2FsZV9jb2xvdXJfYnJld2VyKCksIHNjYWxlX2NvbG91cl9tYW51YWwoKSwgc2NhbGVfeF9jb250aW51b3VzKCkKCioqc2NyaXB0KiogIAphIHRleHQgZmlsZSBjb250YWluaW5nIGNvbW1hbmRzLCBpbiBSIGEgc2NyaXB0IGZpbGVuYW1lIGVuZHMgd2l0aCAuUgoKKip0aGVtZXMqKiAgCnRoZSBub24tZGF0YSBjb21wb25lbnRzIG9mIGEgZ2dwbG90IGUuZy4gYmFja2dyb3VuZCwgZ3JpZCBsaW5lcywgZm9udCBzaXplIGFuZCBmb250IHR5cGUKCioqdGliYmxlKiogIAphIHRpZHl2ZXJzZSBtb2Rlcm4gdmVyc2lvbiBvZiBhIGRhdGEgZnJhbWUsIGhhcyBuaWNlciBwcmludGluZyBhbmQgc3Vic2V0dGluZwoKKip2ZWN0b3IqKiAgCmEgZGF0YSBzdHJ1Y3R1cmUgaW4gUiBjb250YWluaW5nIGEgb25lLWRpbWVuc2lvbmFsIGNvbGxlY3Rpb24gb2YgaXRlbXMgb2YgdGhlIHNhbWUgZGF0YSB0eXBlIGUuZyBjKCJUUDUzIiwgIkJSQ0ExIikKClwgCgojI1N5bWJvbHMjIwoKKipgPmAqKiAgCnByb21wdCBpbiBjb25zb2xlLCBtZWFucyBSIGlzIHJlYWR5IHRvIHRha2UgYSBjb21tYW5kCiAgIApcICAgIAoqKmArYCoqICAKdXNlZCB0byBhZGQgbGF5ZXJzIHRvIGEgZ2dwbG90LiBBbHNvIHRoZSBwcm9tcHQgc3ltYm9sIFIgdXNlcyB3aGVuIHRoZSBjb21tYW5kIGlzIG5vdCBjb21wbGV0ZSwgc3VjaCBhcyBtaXNzaW5nIGEgYClgCiAgClwgICAgCioqYDwtYCoqICAKYXNzaWdubWVudCBvcGVyYXRvciwgc2VlIFRlcm1zIGFib3ZlICAKClwgICAgCioqYCNgKiogIApjb21tZW50LCB0byBhZGQgbm90ZXMgdG8gYSBzY3JpcHQgIAogIApcICAgCioqYCRgKiogIAp3YXkgdG8gYWNjZXNzIGEgc2luZ2xlIGNvbHVtbiB3aXRoIGJhc2UgUiBlLmcuIGBjb3VudHMkR0VORU5BTUVgICAKICAKXCAgIAoqKmAlaW4lYCoqICAKb3BlcmF0b3IgdXNlZCB0byB0ZXN0IGlmIGEgdmFsdWUgaXMgaW4gYSBzZXQgb2YgdmFsdWVzICAKICAKXCAgIAoqKmA9PWAqKiAgCnRlc3QgaWYgYSB2YWx1ZSBpcyB0aGUgc2FtZSBhcyBhbm90aGVyIHZhbHVlICAKICAKXCAgIAoqKmAhPWAqKiAgCnRlc3QgaWYgYSB2YWx1ZSBpcyBub3QgdGhlIHNhbWUgYXMgYW5vdGhlciB2YWx1ZSAgCiAgClwgICAKKipgJmAqKiAgCmFuZCAgICAKICAKXCAgIAoqKmB8YCoqICAKb3IKICAKXCAgIAoqKmAlPiVgKiogIApkcGx5ciBwaXBlIG9wZXJhdG9yLCBzZWUgVGVybXMgYWJvdmUgIAogIApcICAgCioqYH5gKiogIApzeW1ib2wgdG8gdXNlIHdoZW4gZmFjZXRpbmcgaW4gZ2dwbG90MiwgdXNlZCB0byBpbmRpY2F0ZSB0aGUgY29sdW1uIHRvIHVzZSB0byBmYWNldAo=