Acknowledgements

Thanks to Anna Trigos and Franco Caramia for suggestions for this challenge.

Challenge

You’ve been working with RNA-seq samples from mouse mammary gland basal and luminal cells, to try to better understand breast cancer. Now you want to check the expression of some genes in human basal and luminal breast cancer samples. You decide to use the TCGA breast cancer dataset as it has many (hundreds) of samples and the RNA-seq counts are available to download.

The aim of this challenge is to generate boxplots of RNA-seq expression for several genes, similar to below. The subtypes (e.g. basal, luminal) have been added to the counts file and it has been formatted and subsetted for you. Feel free to choose different colours and add any other modifications that you think makes it look better.

Steps

  • Read in the file called tcga_rna.tsv.gz, save it as an object called exp.
  • First make a boxplot for the ESR1 gene (estrogen receptor). Plot the subtypes (PAM50 column) on the X axis and the RNA-seq counts (Value column) on the y. Hint: use filter.
  • Then make boxplots for the genes ESR1, ERBB2, CD8A, CD3D, AURKA, IFNG. Hint: use facet_wrap
  • Remove the NA on the x axis and legend. Hint: use drop_na.
  • Change the X axis label to Subtype and the Y axis to Expression.
  • Save the boxplots as a PDF called TCGA_boxplots.pdf.
  • Email the instructor your PDF and code.
LS0tCnRpdGxlOiAiSW50cm9kdWN0aW9uIHRvIFI6IFdlZWsgNCBQcmFjdGljZSIKYXV0aG9yOiAiTWFyaWEgRG95bGUiCmRhdGU6ICJgciBmb3JtYXQoU3lzLnRpbWUoKSwgJyVkICVCICVZJylgIgpvdXRwdXQ6IAogIGh0bWxfbm90ZWJvb2s6CiAgICB0b2M6IHllcwogICAgdG9jX2Zsb2F0OiB5ZXMKICAgIHRvY19kZXB0aDogMgpzdWJ0aXRsZTogVENHQSBjaGFsbGVuZ2UKLS0tCgojIyMjIEFja25vd2xlZGdlbWVudHMKVGhhbmtzIHRvIEFubmEgVHJpZ29zIGFuZCBGcmFuY28gQ2FyYW1pYSBmb3Igc3VnZ2VzdGlvbnMgZm9yIHRoaXMgY2hhbGxlbmdlLgoKIyMgQ2hhbGxlbmdlCgpZb3UndmUgYmVlbiB3b3JraW5nIHdpdGggUk5BLXNlcSBzYW1wbGVzIGZyb20gbW91c2UgbWFtbWFyeSBnbGFuZCBiYXNhbCBhbmQgbHVtaW5hbCBjZWxscywgdG8gdHJ5IHRvIGJldHRlciB1bmRlcnN0YW5kIGJyZWFzdCBjYW5jZXIuIE5vdyB5b3Ugd2FudCB0byBjaGVjayB0aGUgZXhwcmVzc2lvbiBvZiBzb21lIGdlbmVzIGluICpodW1hbiogYmFzYWwgYW5kIGx1bWluYWwgYnJlYXN0IGNhbmNlciBzYW1wbGVzLiBZb3UgZGVjaWRlIHRvIHVzZSB0aGUgW1RDR0EgYnJlYXN0IGNhbmNlcl0oaHR0cHM6Ly93d3cubmF0dXJlLmNvbS9hcnRpY2xlcy9uYXR1cmUxMTQxMikgZGF0YXNldCBhcyBpdCBoYXMgbWFueSAoaHVuZHJlZHMpIG9mIHNhbXBsZXMgYW5kIHRoZSBSTkEtc2VxIGNvdW50cyBhcmUgYXZhaWxhYmxlIHRvIGRvd25sb2FkLgoKVGhlIGFpbSBvZiB0aGlzIGNoYWxsZW5nZSBpcyB0byBnZW5lcmF0ZSBib3hwbG90cyBvZiBSTkEtc2VxIGV4cHJlc3Npb24gZm9yIHNldmVyYWwgZ2VuZXMsIHNpbWlsYXIgdG8gYmVsb3cuIFRoZSBzdWJ0eXBlcyAoZS5nLiBiYXNhbCwgbHVtaW5hbCkgaGF2ZSBiZWVuIGFkZGVkIHRvIHRoZSBjb3VudHMgZmlsZSBhbmQgaXQgaGFzIGJlZW4gZm9ybWF0dGVkIGFuZCBzdWJzZXR0ZWQgZm9yIHlvdS4gRmVlbCBmcmVlIHRvIGNob29zZSBkaWZmZXJlbnQgY29sb3VycyBhbmQgYWRkIGFueSBvdGhlciBtb2RpZmljYXRpb25zIHRoYXQgeW91IHRoaW5rIG1ha2VzIGl0IGxvb2sgYmV0dGVyLiAKCiFbXShUQ0dBX2JveHBsb3RzLnBuZykKCiMjIyBTdGVwcwoKKiBSZWFkIGluIHRoZSBmaWxlIGNhbGxlZCBgdGNnYV9ybmEudHN2Lmd6YCwgc2F2ZSBpdCBhcyBhbiBvYmplY3QgY2FsbGVkIGV4cC4KKiBGaXJzdCBtYWtlIGEgYm94cGxvdCBmb3IgdGhlIEVTUjEgZ2VuZSAoZXN0cm9nZW4gcmVjZXB0b3IpLiBQbG90IHRoZSBzdWJ0eXBlcyAoUEFNNTAgY29sdW1uKSBvbiB0aGUgWCBheGlzIGFuZCB0aGUgUk5BLXNlcSBjb3VudHMgKFZhbHVlIGNvbHVtbikgb24gdGhlIHkuIEhpbnQ6IHVzZSBmaWx0ZXIuCiogVGhlbiBtYWtlIGJveHBsb3RzIGZvciB0aGUgZ2VuZXMgRVNSMSwgRVJCQjIsIENEOEEsIENEM0QsIEFVUktBLCBJRk5HLiBIaW50OiB1c2UgZmFjZXRfd3JhcAoqIFJlbW92ZSB0aGUgTkEgb24gdGhlIHggYXhpcyBhbmQgbGVnZW5kLiBIaW50OiB1c2UgZHJvcF9uYS4KKiBDaGFuZ2UgdGhlIFggYXhpcyBsYWJlbCB0byBTdWJ0eXBlIGFuZCB0aGUgWSBheGlzIHRvIEV4cHJlc3Npb24uIAoqIFNhdmUgdGhlIGJveHBsb3RzIGFzIGEgUERGIGNhbGxlZCBgVENHQV9ib3hwbG90cy5wZGZgLgoqIEVtYWlsIHRoZSBpbnN0cnVjdG9yIHlvdXIgUERGIGFuZCBjb2RlLgo=