- Open RStudio or an R terminal.
- Assuming your files are stored in
bi/fancy-project/
, set the correct working directory:1> setwd("bi/fancy-project/") -
Read the data and convert to a matrix
12345678910> table = read.delim("expression-values.txt", row.names=T,header=T, sep="\t")> matrix = as.matrix(table)> head(matrix)lung_0 lung_3 lung_1 lung_2 stomach_0 stomach_1 stomach_2 stomach_3ENSG00000001617 683.958 0.00 0.000 261.9730 957.383 99.6764 699.5950 529.600ENSG00000003756 1954.310 3010.74 5925.110 3415.3400 152.799 6527.1900 0.0000 2257.020ENSG00000004399 137.821 0.00 0.000 98.4343 541.667 2778.8100 611.6800 686.690ENSG00000004534 2220.190 6278.09 12119.300 4787.7500 1647.350 10182.6000 1534.9700 1601.320ENSG00000004838 3190.500 0.00 0.000 509.9190 808.094 0.0000 421.9870 584.436ENSG00000007402 107.916 0.00 375.487 0.0000 210.593 0.0000 66.2355 544.656
txt
file is read (column separator istab
). You can export a tab-delimited text file from your favorite spreadsheet software. If you choose to export to aCSV
, make sure you change thesep="\t"
param tosep=","
. See here. -
Cluster column and row data
12> hc = hclust(as.dist(1-cor(matrix, method="spearman")), method="complete")> hr = hclust(as.dist(1-cor(t(matrix), method="pearson")), method="complete")
-
Create a heatmap and plot to a PDF file
1234> library(heatmap3)> pdf("heatmap.pdf", width = 12, height = 12)> heatmap3(matrix, Rowv=as.dendrogram(hr), Colv=as.dendrogram(hc), scale="row", balanceColor=T, showRowDendro=T, labRow=F, ColSideCut=0.6)> dev.off()
Category: R
get the number of rows for a table or matrix
1 2 3 4 5 6 7 8 9 10 11 |
> BOD Time demand 1 1 8.3 2 2 10.3 3 3 19.0 4 4 16.0 5 5 15.6 6 7 19.8 > nrow(BOD) [1] 6 |
Install cummeRbund on Ubuntu 14.04/ Linux Mint 17.3
Update R to the latest version.
1 |
$ sudo apt-get install libxml2-dev libcurl4-gnutls-dev |
1 |
$ sudo R |
1 2 |
> source("https://bioconductor.org/biocLite.R") > biocLite("cummeRbund") |
Ubuntu 14.04/ Linux Mint 17.3: Update R
Get the latest pre-build
deb
file here: cran.es.r-project.org
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
$ sudo apt-get install r-base $ R --version R version 3.0.2 (2013-09-25) -- "Frisbee Sailing" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) $ sudo apt-get install texlive-base $ wget http://cran.es.r-project.org/bin/linux/ubuntu/trusty/r-base-core_3.2.3-3trusty0_amd64.deb $ sudo dpkg -i r-base-core_3.2.3-3trusty0_amd64.deb $ R --version R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" Copyright (C) 2015 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) |
list loaded libraries
1 2 3 |
> (.packages()) [1] "heatmap3" "stats" "graphics" "grDevices" "utils" "datasets" "methods" [8] "base" |
access a subset of rows/ columns
1 2 3 4 5 6 7 8 |
> head(BOD) Time demand 1 1 8.3 2 2 10.3 3 3 19.0 4 4 16.0 5 5 15.6 6 7 19.8 |
1 2 3 4 |
> BOD[2:3,] Time demand 2 2 10.3 3 3 19.0 |
1 2 3 4 |
> BOD[2:3,1:2] Time demand 2 2 10.3 3 3 19.0 |
sort/ order data table by comlumn
1 2 3 4 5 6 7 8 |
> head(BOD) Time demand 1 1 8.3 2 2 10.3 3 3 19.0 4 4 16.0 5 5 15.6 6 7 19.8 |
1 2 |
> order(BOD) [1] 1 2 3 4 5 6 7 8 11 10 9 12 |
1 2 |
> order(BOD$Time) [1] 1 2 3 4 5 6 |
1 2 |
> order(BOD$demand) [1] 1 2 5 4 3 6 |
1 2 3 4 5 6 7 8 |
> BOD[c(1,2,5,4,3,6),] Time demand 1 1 8.3 2 2 10.3 5 5 15.6 4 4 16.0 3 3 19.0 6 7 19.8 |
1 2 3 4 5 6 7 8 |
> BOD[order(BOD$demand),] Time demand 1 1 8.3 2 2 10.3 5 5 15.6 4 4 16.0 3 3 19.0 6 7 19.8 |
1 2 3 4 5 6 7 8 |
> BOD[order(BOD$demand, decreasing=T),] Time demand 6 7 19.8 3 3 19.0 4 4 16.0 5 5 15.6 2 2 10.3 1 1 8.3 |
create a vector
1 2 3 |
> vector = c(2,1,3) > vector [1] 2 1 3 |
access first row/ column in a data table
1 2 3 4 5 6 7 8 |
> head(BOD) Time demand 1 1 8.3 2 2 10.3 3 3 19.0 4 4 16.0 5 5 15.6 6 7 19.8 |
1 2 3 |
> BOD[1,] Time demand 1 1 8.3 |
1 2 3 4 5 |
> BOD[,"demand"] [1] 8.3 10.3 19.0 16.0 15.6 19.8 > BOD$demand [1] 8.3 10.3 19.0 16.0 15.6 19.8 |
1 2 |
> BOD[,1] [1] 1 2 3 4 5 7 |
read tab-delimted table
1 |
fpkm_table = read.delim("genes.fpkm_table", row.names = 1, header = TRUE, sep="\t") |
-
header=TRUE
: the table has a header row (column names). -
row.names=1
: column one contains the table’s row names. -
sep="\t"
:tab
is used to separate columns.