Bulk RNAseq: Differential Expression Analysis
Environment Setup
salloc -N 1 --exclusive -p amd -t 8:00:00
# use the same env used for #1
conda activate smallrna
# working dir
mkdir -p /work/LAS/geetu-lab/arnstrm/mouse.ev.RNAseq
cd /work/LAS/geetu-lab/arnstrm/mouse.ev.RNAseq
# file structure
tree -L 1
.
├── 1_data
├── 2_fastqc
├── 3_STAR-mapping
├── 4_featureCounts
└── 5_multiqcRaw data
Raw data was downloaded from the sequencing facility using the
rsync command, with authentication. The downloaded files
were checked for md5sum and compared against list of files expected as
per the input samples provided.
cd 1_data
rsync -rltvPh ext-arnstrm@download.bioinformatics.missouri.edu:xyz .
# link masked
# GEO link will be included laterGenome/annotation
Additional files required for the analyses were downloaded from GenCode. The downloaded files are as follows:
cd 3_STAR-mapping
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M30/GRCm39.primary_assembly.genome.fa.gz
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M30/gencode.vM30.annotation.gff3.gz
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M30/gencode.vM30.annotation.gtf.gz
gunzip GRCm39.primary_assembly.genome.fa.gz
gunzip gencode.vM30.annotation.gff3.gz
gunzip gencode.vM30.annotation.gtf.gz
# ids for prot coding genes
awk '$3=="gene" {print $9}' gencode.vM30.annotation.gff3 |\
cut -f 1-3 -d ";" |\
sed -e 's/;/\t/g' -e 's/gene_id=//g' -e 's/ID=//g' |\
grep "gene_type=protein_coding" |\
cut -f 1 > mm10.protein_codingFastQC
Quality inspection of the reads. The multiqc report,
collating all samples together are provided as html file.
cd 2_fastqc
for fq in ../1_data/*.fq.gz; do
fastqc --threads $SLURM_JOB_CPUS_PER_NODE $fq;
doneMapping
To index the genome, following command was run (in an interactive session).
fastaGenome="GRCm39.genome.fa"
gtf="gencode.vM30.annotation.gtf"
STAR --runThreadN $SLURM_JOB_CPUS_PER_NODE \
--runMode genomeGenerate \
--genomeDir $(pwd) \
--genomeFastaFiles $fastaGenome \
--sjdbGTFfile $gtf \
--sjdbOverhang 1Each fastq file was mapped to the indexed genome as
using runSTAR_map.sh script shown below:
#!/bin/bash
read1=$1
read2=$(echo ${read1} | sed 's/_L003_R1_001.fastq.gz/_L003_R2_001.fastq.gz/g')
cpus=${SLURM_JOB_CPUS_PER_NODE}
out=$(basename ${read1} | sed 's/_L003_R1_001.fastq.gz//g')
index=/work/LAS/geetu-lab/arnstrm/GRCm39_index
STAR \
--runThreadN ${cpus} \
--genomeDir ${index} \
--outSAMtype BAM SortedByCoordinate \
--quantMode GeneCounts \
--outFilterScoreMinOverLread 0.3 \
--outFilterMatchNminOverLread 0.3 \
--outFileNamePrefix ${out}_ \
--readFilesCommand zcat \
--outWigType bedGraph \
--outWigStrand Unstranded \
--outWigNorm RPM \
--readFilesIn ${read1} ${read2}Mapping was run with a simple loop:
for fq in *_R1_*fastq.gz; do
runSTAR_map.sh $fq;
doneCounts
For generating counts from the mapped reads, we used
subread package program featureCounts. All bam
files were supplied togehter to generate a single count file for
individual samples.
cd 3_STAR-mapping
realpath *.bam > ../4_featureCounts/bam.fofn
cd ../4_featureCounts
while read line; do
ln -s $line;
done
featureCounts \
-T ${SLURM_CPUS_ON_NODE} \
-a gencode.vM30.annotation.gtf \
-t exon \
-g gene_id \
-p \
-B \
--countReadPairs \
-o merged_counts.txt \
--tmpDir ./tmp *.bamThe generated counts file was processed to use it direclty with
DESeq2
cut -f 1,7- merged_counts.txt |\
grep -v "^#" |\
sed 's/_S...\?_Aligned.sortedByCoord.out.bam//g' |\
sed 's/mNCSC_//g' > merged_counts-clean.tsv
grep -Fw -f mm10.protein_coding merged_counts-clean.tsv > body
head -n 1 merged_counts-clean.tsv > head
cat head body >> merged_counts-clean-prot.tsv
rm head bodyCreate a info file:
head -n 1 merged_counts-clean.tsv |\
tr "\t" "\n" |\
grep -v "^Geneid" |\
awk '{print $1"\t"$1}' |\
sed 's/_.$//g' > info.tsvDifferential expression analysis
Differential expression (DE) analyses using DESeq2 was
performed as shown below.
Prerequisites
R packages required for this section are loaded
# set path
setwd("/work/LAS/geetu-lab/arnstrm/mouse.trophoblast.smallRNAseq")
# load the modules
library(tidyverse)
library(DESeq2)
library(vsn)
library(pheatmap)
library(ggrepel)
library(RColorBrewer)
library(reshape2)
require(biomaRt)
library(TissueEnrich)
library(plotly)
library(cowplot)
library(biomaRt)
library(scales)
library(kableExtra)
library(htmlwidgets)
library(DT)
library(enrichR)Import datasets
The counts data and its associated metadata
(coldata) are imported for analyses.
countsFile = 'assets/merged_counts-clean-prot.tsv'
groupFile = 'assets/info.tsv'
coldata <-
read.csv(
groupFile,
row.names = 1,
sep = "\t",
header = FALSE,
stringsAsFactors = TRUE
)
colnames(coldata) <- "condition"
cts <- as.matrix(read.delim(countsFile, row.names = 1, header = TRUE))Reorder columns of cts according to coldata
rows. Check if samples in both files match.
all(rownames(coldata) %in% colnames(cts))
#> [1] TRUE
cts <- cts[, rownames(coldata)]DESeq2
The batch corrected read counts are then used for running DESeq2 analyses
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = coldata,
design = ~ condition)
vsd <- vst(dds, blind = FALSE)
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep, ]
dds <- DESeq(dds)PCA plot for QC
PCA plot for the dataset that includes all libraries.
rv <- rowVars(assay(vsd))
pcaData <-
plotPCA(vsd,
intgroup = "condition",
returnData = TRUE)
percenttVar1 <- round(100 * attr(pcaData, "percentVar"))
# @ Thomas W. Battaglia
#' Plot DESeq2's PCA plotting with Plotly 3D scatterplot
#'
#' The function will generate a plot_ly 3D scatter plot image for
#' a 3D exploration of the PCA.
#'
#' @param object a DESeqTransform object, with data in assay(x), produced for example by either rlog or varianceStabilizingTransformation.
#' @param intgroup interesting groups: a character vector of names in colData(x) to use for grouping
#' @param ntop number of top genes to use for principal components, selected by highest row variance
#' @param returnData should the function only return the data.frame of PC1, PC2 and PC3 with intgroup covariates for custom plotting (default is FALSE)
#' @return An object created by plot_ly, which can be assigned and further customized.
#' @export
plotPCA3D <-
function (object,
intgroup = "condition",
ntop = 500,
returnData = FALSE) {
rv <- rowVars(assay(object))
select <-
order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
pca <- prcomp(t(assay(object)[select,]))
percentVar <- pca$sdev ^ 2 / sum(pca$sdev ^ 2)
if (!all(intgroup %in% names(colData(object)))) {
stop("the argument 'intgroup' should specify columns of colData(dds)")
}
intgroup.df <-
as.data.frame(colData(object)[, intgroup, drop = FALSE])
group <- if (length(intgroup) > 1) {
factor(apply(intgroup.df, 1, paste, collapse = " : "))
}
else {
colData(object)[[intgroup]]
}
d <- data.frame(
PC1 = pca$x[, 1],
PC2 = pca$x[, 2],
PC3 = pca$x[, 3],
group = group,
intgroup.df,
name = colnames(object)
)
if (returnData) {
attr(d, "percentVar") <- percentVar[1:3]
return(d)
}
message("Generating plotly plot")
p <- plotly::plot_ly(
data = d,
x = ~ PC1,
y = ~ PC2,
z = ~ PC3,
color = group,
mode = "markers",
type = "scatter3d"
)
return(p)
}
select <-
order(rv, decreasing = TRUE)[seq_len(min(500, length(rv)))]
pca <- prcomp(t(assay(vsd)[select, ]))
percentVar <- pca$sdev ^ 2 / sum(pca$sdev ^ 2)
intgroup = "condition"
intgroup.df <- as.data.frame(colData(vsd)[, intgroup, drop = FALSE])
group <- if (length(intgroup) == 1) {
factor(apply(intgroup.df, 1, paste, collapse = " : "))
}
d <- data.frame(
PC1 = pca$x[, 1],
PC2 = pca$x[, 2],
intgroup.df,
name = colnames(vsd)
)
ggplot(d, aes(PC1, PC2, color = condition)) +
scale_shape_manual(values = 1:12) +
scale_color_manual(values = c('mTSC_EV' = '#0178c7',
'No_EV' = '#7acc37',
'pTGC_EV' = '#8e27af')) +
theme_bw() +
theme(legend.title = element_blank()) +
geom_point(size = 2, stroke = 2) +
geom_text_repel(aes(label = name)) +
xlab(paste("PC1", round(percentVar[1] * 100, 2), "% variance")) +
ylab(paste("PC2", round(percentVar[2] * 100, 2), "% variance"))Figure 4: PCA plot for the first 2 principal components
Interactive 3D PCA plot
g= plotPCA3D(vsd, intgroup = "condition")
gView the interactive 3D PCA plot in a new tab.
Sample distance for QC
sampleDists <- dist(t(assay(vsd)))
sampleDistMatrix <- as.matrix( sampleDists )
rownames(sampleDistMatrix) <- colnames(vsd)
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
clustering_distance_rows = sampleDists,
clustering_distance_cols = sampleDists,
col = colors)Figure 5: Euclidean distance between samples
Set contrasts and find DE genes
res.NoEvsTSC <-
results(dds,
contrast = c(
"condition",
"No_EV",
"mTSC_EV"))
res.NoEvsTGC <-
results(dds,
contrast = c(
"condition",
"No_EV",
"pTGC_EV"))Functions
Processing DE objects
processDE <- function(res.se, string) {
fc = 1.5
log2fc = log(fc, base = 2)
neg.log2fc = log2fc * -1
res.se <- res.se[order(res.se$padj), ]
res.data <-
merge(as.data.frame(res.se),
as.data.frame(counts(dds, normalized = TRUE)),
by = "row.names",
sort = FALSE)
names(res.data)[1] <- "Gene"
res.up <-
res.data %>%
filter(log2FoldChange >= log2fc) %>%
filter(padj <= 0.05) %>%
arrange(desc(log2FoldChange)) %>%
dplyr::select(Gene)
res.dw <-
res.data %>%
filter(log2FoldChange <= neg.log2fc) %>%
filter(padj <= 0.05) %>%
arrange(desc(log2FoldChange)) %>%
dplyr::select(Gene)
res.up.new <-
annot[annot$ensembl_gene_id_version %in% res.up$Gene,]
res.dw.new <-
annot[annot$ensembl_gene_id_version %in% res.dw$Gene,]
res.data.info <-
left_join(res.data, mart, by = c('Gene' = 'ensembl_gene_id_version'))
res.data.filtered <- res.data.info %>%
filter(padj <= 0.05) %>%
filter(log2FoldChange >= log2fc | log2FoldChange <= neg.log2fc) %>%
arrange(desc(log2FoldChange))
pce.up1 <- paste0(string, ".up.pce", 1)
pce.dw1 <- paste0(string, ".dw.pce", 1)
pce.up2 <- paste0(string, ".up.pce", 2)
pce.dw2 <- paste0(string, ".dw.pce", 2)
DEGtable <- paste0(string, ".DE.table")
assign(pce.up1, as.character(res.up.new$ensembl_gene_id), envir = .GlobalEnv)
assign(pce.dw1, as.character(res.dw.new$ensembl_gene_id), envir = .GlobalEnv)
assign(pce.up2, as.character(res.up.new$external_gene_name), envir = .GlobalEnv)
assign(pce.dw2, as.character(res.dw.new$external_gene_name), envir = .GlobalEnv)
assign(DEGtable, res.data.info, envir = .GlobalEnv)
# save full table
write_delim(
res.data.info,
file = paste0("results/DESeq2results-", string, "_fc.tsv"),
delim = "\t"
)
# save filtered table (fc = 1.5 & padj <= 0.05)
write_delim(
res.data.filtered,
file = paste0("results/DE_", string, "_filtered.tsv"),
delim = "\t"
)
}Gene information
ensembl = useMart("ENSEMBL_MART_ENSEMBL")
listDatasets(ensembl) %>%
filter(str_detect(description, "Mouse"))
ensembl = useDataset("mmusculus_gene_ensembl", mart = ensembl)
listFilters(ensembl) %>%
filter(str_detect(name, "ensembl"))
filterType <- "ensembl_gene_id_version"
head(rownames(counts))
counts <- read.delim(countsFile, row.names = 1, header = TRUE)
head(rownames(counts))
filterValues <- rownames(counts)
listAttributes(ensembl) %>%
head(20)
attributeNames <- c('ensembl_gene_id_version',
'ensembl_gene_id',
'external_gene_name')
annot <- getBM(
attributes = attributeNames,
filters = filterType,
values = filterValues,
mart = ensembl
)
attributeNames <- c('ensembl_gene_id_version',
'gene_biotype',
'external_gene_name',
'description')
mart <- getBM(
attributes = attributeNames,
filters = filterType,
values = filterValues,
mart = ensembl
)
write_delim(
annot,
file = "assets/annot.tsv",
delim = "\t"
)
write_delim(
mart,
file = "assets/mart.tsv",
delim = "\t"
) Files were saved, so we don’t query BioMart everytime we run the markdown. The files will be loaded, instead
mart <-
read.csv(
"assets/mart.tsv",
sep = "\t",
header = TRUE,
)
annot <-
read.csv(
"assets/annot.tsv",
sep = "\t",
header = TRUE,
)Volcano plotting function
g <- volcanoPlots <-
function(res.se,
string,
first,
second,
color1,
color2,
color3,
ChartTitle) {
fc = 1.5
log2fc = log(fc, base = 2)
neg.log2fc = log2fc * -1
res.se <- res.se[order(res.se$padj),]
res.se <-
rownames_to_column(as.data.frame(res.se[order(res.se$padj), ]))
names(res.se)[1] <- "Gene"
res.data <-
merge(res.se,
mart,
by.x = "Gene",
by.y = "ensembl_gene_id_version")
res.data <- res.data %>% mutate_all(na_if, "")
res.data <- res.data %>% mutate_all(na_if, " ")
res.data <-
res.data %>% mutate(external_gene_name = coalesce(external_gene_name, Gene))
res.data$diffexpressed <- "other.genes"
res.data$diffexpressed[res.data$log2FoldChange >= log2fc &
res.data$padj <= 0.05] <-
paste("Higher expression in", first)
res.data$diffexpressed[res.data$log2FoldChange <= neg.log2fc &
res.data$padj <= 0.05] <-
paste("Higher expression in", second)
upgenes <- res.data %>%
dplyr::filter(log2FoldChange >= log2fc & padj <= 0.05) %>%
arrange(desc(log2FoldChange)) %>%
mutate(delabel = external_gene_name) %>%
select(Gene, delabel) %>%
top_n(10)
downgenes <- res.data %>%
dplyr::filter(log2FoldChange <= neg.log2fc &
padj <= 0.05) %>%
arrange(desc(log2FoldChange)) %>%
mutate(delabel = external_gene_name) %>%
select(Gene, delabel) %>%
top_n(10)
fullgenes <- rbind(upgenes, downgenes)
res.data <- left_join(res.data, fullgenes, by = "Gene")
res.data$delabel[res.data$external_gene_name == "Htr2b"] <-
"Htr2b"
ggplot(res.data,
aes(
x = log2FoldChange,
y = -log10(padj),
col = diffexpressed,
label = delabel
)) +
geom_point(alpha = 0.5) +
xlim(-2.5, 2.5) +
theme_classic() +
scale_color_manual(name = "Expression", values = c(color1, color3, color2)) +
geom_label_repel(
data = res.data,
aes(size = 0.5, point.size = 0.5),
max.overlaps = Inf,
force_pull = 0,
min.segment.length = 0.5,
show.legend = F,
seed = 11,
box.padding = 0.5
) +
ggtitle(ChartTitle) +
xlab(paste("log2 fold change")) +
ylab("-log10 pvalue (adjusted)") +
theme(legend.text.align = 0)
}GenePlot function
gnCountFn <- function(geneName="ENSMUSG00000026228.7") {
geneCounts <- as.data.frame(counts(dds, normalized = TRUE)[geneName, ])
colnames(geneCounts) <- "normalizedCounts"
geneCounts <- geneCounts %>%
rownames_to_column("condition") %>%
mutate(condition = str_sub(condition, 1, str_length(condition)-2))
geneCounts$condition <- factor(geneCounts$condition,
levels=c('No_EV', 'mTSC_EV', 'pTGC_EV'))
g <- ggplot(geneCounts, aes(x = condition, y = normalizedCounts, fill = condition)) +
geom_boxplot(color = "black") +
stat_summary(
fun.y = mean,
geom = "point",
shape = 23,
size = 2
) +
geom_dotplot(
binaxis = 'y',
binwidth = 1,
stackdir = 'center',
dotsize = 0.75
) +
scale_fill_manual(values = c(
"No_EV" = "#7acc37",
"mTSC_EV" = "#0178c7",
"pTGC_EV" = "#8e27af"
)) +
theme_clean()
g
}TissueEnrich function
source("assets/theme_clean.R")
plotTE <- function(inputGenes = gene.list,
myColor = "color") {
gs <-
GeneSet(geneIds = inputGenes,
organism = "Mus Musculus",
geneIdType = SymbolIdentifier())
output <- teEnrichment(inputGenes = gs, rnaSeqDataset = 3)
en.output <-
setNames(data.frame(assay(output[[1]]),
row.names = rowData(output[[1]])[, 1]),
colData(output[[1]])[, 1])
en.output$Tissue <- rownames(en.output)
logp <- -log10(0.05)
en.output <-
mutate(en.output,
significance = ifelse(Log10PValue > logp,
"colored", "nocolor"))
en.output$Sig <- "NA"
ggplot(en.output, aes(reorder(Tissue, Log10PValue),
Log10PValue,
fill = significance)) +
geom_bar(stat = 'identity') +
theme_clean() + ylab("- log10 adj. p-value") + xlab("") +
scale_fill_manual(values = c("colored" = myColor, "nocolor" = "gray")) +
scale_y_continuous(expand = expansion(mult = c(0, .1)),
breaks = scales::pretty_breaks()) +
coord_flip()
}enrichR function
plotEnrichR <- function(enriched, table="string", myColor = "slateblue") {
logp <- -log10(0.05)
myData <- enriched[[table]]
myData$negLogP <- -log10(myData$P.value)
myData <-
mutate(myData,
significance = ifelse(negLogP > logp, "colored", "nocolor"))
myData$Sig <- "NA"
myData <- head(arrange(myData, -negLogP, Term), 15)
ggplot(myData, aes(reorder(Term, negLogP),
negLogP,
fill = significance)) +
geom_bar(stat = 'identity') +
theme_clean() + ylab("- log10 p-value") + xlab("") +
scale_fill_manual(values = c("colored" = myColor, "nocolor" = "gray")) +
scale_y_continuous(expand = expansion(mult = c(0, .1)),
breaks = scales::pretty_breaks()) +
coord_flip()
}Results
Write files
processDE(res.NoEvsTSC, "NoEvsTSC")
processDE(res.NoEvsTGC, "NoEvsTGC")Htr2b gene expression
gnCountFn(geneName="ENSMUSG00000026228.7")placeholder
Volcano plots
no_EVs vs. pTGC_EV
g <- volcanoPlots(
res.NoEvsTGC,
"NoEvsTGC",
"No_EVs",
"pTGC_EVs",
"#7acc37",
"#4d4d4d",
"#8e27af",
ChartTitle = "No_EVs vs. pTGC_EVs"
)
g
#> Warning: Removed 2397 rows containing missing values (`geom_point()`).
#> Warning: Removed 15477 rows containing missing values (`geom_label_repel()`).Fig X: No_EVs vs. pTGC_EVs
No_EVs vs. mTSC_EV
g <- volcanoPlots(
res.NoEvsTSC,
"NoEvsTSC",
"No_EVs",
"mTSC_EV",
"#0178c7",
"#4d4d4d",
"#7acc37",
ChartTitle = "No_EVs vs. mTSC_EV"
)
g
#> Warning: Removed 2397 rows containing missing values (`geom_point()`).
#> Warning: Removed 15477 rows containing missing values (`geom_label_repel()`).Fig X: No_EVs vs. pTSC_EVs
Overlap of genes TSC EV (vs. no EV) and pTGC EV (vs. no EV)
All significant
library(ggvenn)
y <- list(pTGC = c(NoEvsTGC.dw.pce1,NoEvsTGC.up.pce1),
TSC = c(NoEvsTSC.dw.pce1, NoEvsTSC.up.pce1))
ggvenn(
y,
fill_color = c("#8e27af", "#0178c7"),
stroke_size = 0.3,
set_name_size = 4
)placeholder
upregulated in pTGC or TSC
library(ggvenn)
y <- list(pTGC = NoEvsTGC.dw.pce1, TSC = NoEvsTSC.dw.pce1)
ggvenn(
y,
fill_color = c("#8e27af", "#0178c7"),
stroke_size = 0.3,
set_name_size = 4
)placeholder
upregulated in no EV
library(ggvenn)
y <- list(pTGC = NoEvsTGC.up.pce1, TSC = NoEvsTSC.up.pce1)
ggvenn(
y,
fill_color = c("#8e27af", "#0178c7"),
stroke_size = 0.3,
set_name_size = 4
)placeholder
TE: No_EV vs. pTGC
up in NoEvs
plotTE(unique(NoEvsTGC.up.pce2), "#7acc37")Fig X: TE for genes upregulated in No_EV samples
up in pTGC
plotTE(unique(NoEvsTGC.dw.pce2), "#8e27af")Fig X: TE for genes upregulated in pTGC samples
TE: No_EV vs. mTSC
up in NoEvs
plotTE(unique(NoEvsTSC.up.pce2), "#7acc37")Fig X: TE for genes upregulated in No_EV sample
up in mTSC
plotTE(unique(NoEvsTSC.dw.pce2), "#0178c7")Fig X: TE for genes upregulated in mTSC samples
Enrichment Analyses
setEnrichrSite("Enrichr")
websiteLive <- TRUE
myDBs <-
c(
"DisGeNET",
"WikiPathways_2019_Human",
"WikiPathways_2019_Mouse",
"KEGG_2019_Mouse"
)
if (websiteLive) {
NoEvsTGC.up.enriched <- enrichr(NoEvsTGC.up.pce2, myDBs)
Sys.sleep(60)
NoEvsTGC.dw.enriched <- enrichr(NoEvsTGC.dw.pce2, myDBs)
Sys.sleep(60)
NoEvsTSC.up.enriched <- enrichr(NoEvsTSC.up.pce2, myDBs)
Sys.sleep(60)
NoEvsTSC.dw.enriched <- enrichr(NoEvsTSC.dw.pce2, myDBs)
}
#> Uploading data to Enrichr... Done.
#> Querying DisGeNET... Done.
#> Querying WikiPathways_2019_Human... Done.
#> Querying WikiPathways_2019_Mouse... Done.
#> Querying KEGG_2019_Mouse... Done.
#> Parsing results... Done.
#> Uploading data to Enrichr... Done.
#> Querying DisGeNET... Done.
#> Querying WikiPathways_2019_Human... Done.
#> Querying WikiPathways_2019_Mouse... Done.
#> Querying KEGG_2019_Mouse... Done.
#> Parsing results... Done.
#> Uploading data to Enrichr... Done.
#> Querying DisGeNET... Done.
#> Querying WikiPathways_2019_Human... Done.
#> Querying WikiPathways_2019_Mouse... Done.
#> Querying KEGG_2019_Mouse... Done.
#> Parsing results... Done.
#> Uploading data to Enrichr... Done.
#> Querying DisGeNET... Done.
#> Querying WikiPathways_2019_Human... Done.
#> Querying WikiPathways_2019_Mouse... Done.
#> Querying KEGG_2019_Mouse... Done.
#> Parsing results... Done.up in NoEvs (No_EV vs. pTGC)
DisGeNET
plotEnrichR(NoEvsTGC.up.enriched, table="DisGeNET" , myColor = "#7acc37")placeholder
WikiPathways_2019_Human
plotEnrichR(NoEvsTGC.up.enriched, table="WikiPathways_2019_Human" , "#7acc37")placeholder
WikiPathways_2019_Mouse
plotEnrichR(NoEvsTGC.up.enriched, table="WikiPathways_2019_Mouse" , "#7acc37")placeholder
KEGG_2019_Mouse
plotEnrichR(NoEvsTGC.up.enriched, table="KEGG_2019_Mouse" , "#7acc37")placeholder
up in pTGC (No_EV vs. pTGC)
DisGeNET
plotEnrichR(NoEvsTGC.dw.enriched, table="DisGeNET" , myColor = "#8e27af")placeholder
WikiPathways_2019_Human
plotEnrichR(NoEvsTGC.dw.enriched, table="WikiPathways_2019_Human" , "#8e27af")placeholder
WikiPathways_2019_Mouse
plotEnrichR(NoEvsTGC.dw.enriched, table="WikiPathways_2019_Mouse" , "#8e27af")placeholder
KEGG_2019_Mouse
plotEnrichR(NoEvsTGC.dw.enriched, table="KEGG_2019_Mouse" , "#8e27af")placeholder
up in NoEvs (No_EV vs. mTSC)
DisGeNET
plotEnrichR(NoEvsTSC.up.enriched, table="DisGeNET" , myColor = "#7acc37")placeholder
WikiPathways_2019_Human
plotEnrichR(NoEvsTSC.up.enriched, table="WikiPathways_2019_Human" , "#7acc37")placeholder
WikiPathways_2019_Mouse
plotEnrichR(NoEvsTGC.up.enriched, table="WikiPathways_2019_Mouse" , "#7acc37")placeholder
KEGG_2019_Mouse
plotEnrichR(NoEvsTGC.up.enriched, table="KEGG_2019_Mouse" , "#7acc37")placeholder
up in mTSC (No_EV vs. mTSC)
DisGeNET
plotEnrichR(NoEvsTSC.dw.enriched, table="DisGeNET" , myColor = "#0178c7")placeholder
WikiPathways_2019_Human
plotEnrichR(NoEvsTSC.dw.enriched, table="WikiPathways_2019_Human" , "#0178c7")placeholder
WikiPathways_2019_Mouse
plotEnrichR(NoEvsTSC.dw.enriched, table="WikiPathways_2019_Mouse" , "#0178c7")placeholder
KEGG_2019_Mouse
plotEnrichR(NoEvsTSC.dw.enriched, table="KEGG_2019_Mouse" , "#0178c7")placeholder
Overlap analyses
For the top quartile (expression) miRNAs, targets were predicted in the previous sheet. Here we will import them (both target gene lists as well as brain specific target gene lists) and perform overlap analyses with the bulk RNAseq results.
brainGeneLists <- readRDS("results/brainGeneLists.rds")
targetLists <- readRDS("results/targetsLists_75pc.rds")mergeTargetsWithDEG <- function(deTable = NoEvsTSC.DE.table,
targetTable = targetLists$miRNAs_75pc_pTGC,
brainGenes = brainGeneLists$brEnrTargets_75pc_pTGC) {
# thresholds
fc = 1.5
log2fc = log(fc, base = 2)
neg.log2fc = log2fc * -1
# make clean DE table
cleanDE <- deTable %>%
dplyr::select(Gene, log2FoldChange, padj, external_gene_name, description) %>%
mutate(Gene = gsub("\\..*", "", Gene))
# make clean brain gene list
brainPAV <- as.data.frame(brainGenes) %>%
rename(SYMBOL = 1) %>%
mutate(SYMBOL = toupper(SYMBOL), BrainSpecific = 1)
# merge targets with clean tables
mergedTable <-
targetTable %>%
left_join(cleanDE, by = c("ENSEMBL" = "Gene")) %>%
mutate(SYMBOL = toupper(SYMBOL)) %>%
mutate(RNAseq = case_when((log2FoldChange > log2fc &
padj <= 0.05) ~ 'up',
(log2FoldChange < neg.log2fc &
padj <= 0.05) ~ 'down',
TRUE ~ 'NA'
)) %>%
dplyr::select(mirbase_id, ENSEMBL, SYMBOL, RNAseq, log2FoldChange, padj, description) %>%
left_join(as_tibble(brainPAV), by = "SYMBOL") %>%
filter(mirbase_id != "") %>% distinct()
mergedTable
}createSummaryTable <-
function(mergedTabe = NoEvsTSC.DE_with_pTGC.75pc_targets) {
t1 <- mergedTabe %>%
group_by(mirbase_id, RNAseq) %>%
summarise(BrainIntersecting = sum(!is.na(BrainSpecific))) %>%
filter(mirbase_id != "") %>%
group_by(mirbase_id) %>%
spread(RNAseq, BrainIntersecting) %>%
select(mirbase_id, down, up, `NA`) %>%
rename(brainDownDE = down,
brainUpDE = up,
Brain.NA = `NA`) %>%
mutate(totalBrain = sum(brainDownDE,
Brain.NA,
brainUpDE,
na.rm =TRUE))
t2 <- mergedTabe %>%
group_by(mirbase_id, RNAseq) %>%
summarise(AllIntersecting = n()) %>%
filter(mirbase_id != "") %>%
group_by(mirbase_id) %>%
spread(RNAseq, AllIntersecting) %>%
select(mirbase_id, down, up, `NA`) %>%
rename(
TargetsDownDE = down,
TargetsUpDE = up,
TargetsNA = `NA`
) %>%
mutate(totalTargets = sum(TargetsDownDE,
TargetsUpDE,
TargetsNA,
na.rm =TRUE))
summaryTable <- inner_join(t1, t2, by = "mirbase_id")
summaryTable
}NoEvsTSC.DE_with_mTSC.75pc_targets <-
mergeTargetsWithDEG(
deTable = NoEvsTSC.DE.table,
targetTable = targetLists$miRNAs_75pc_mTSC,
brainGenes = brainGeneLists$brEnrTargets_75pc_mTSC
)
NoEvsTSC.DE_with_mTSC.75pc_summary <-
createSummaryTable(NoEvsTSC.DE_with_mTSC.75pc_targets)
write_delim(
NoEvsTSC.DE_with_mTSC.75pc_targets ,
"results/NoEvsTSC.DE_with_mTSC.75pc_targets.tsv",
delim = "\t"
)
write_delim(
NoEvsTSC.DE_with_mTSC.75pc_summary,
"results/NoEvsTSC.DE_with_mTSC.75pc_summary.tsv",
delim = "\t"
)NoEvsTGC.DE_with_pTGC.75pc_targets <-
mergeTargetsWithDEG(
deTable = NoEvsTGC.DE.table,
targetTable = targetLists$miRNAs_75pc_pTGC,
brainGenes = brainGeneLists$brEnrTargets_75pc_pTGC
)
NoEvsTGC.DE_with_pTGC.75pc_summary <-
createSummaryTable(NoEvsTGC.DE_with_pTGC.75pc_targets)
write_delim(
NoEvsTGC.DE_with_pTGC.75pc_targets,
"results/NoEvsTGC.DE_with_pTGC.75pc_targets.tsv",
delim = "\t"
)
write_delim(
NoEvsTGC.DE_with_pTGC.75pc_summary,
"results/NoEvsTGC.DE_with_pTGC.75pc_summary.tsv",
delim = "\t"
)MultiQC report
MultiQC report is available at this link
Session Information
sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] grid stats4 stats graphics grDevices utils datasets
#> [8] methods base
#>
#> other attached packages:
#> [1] ggvenn_0.1.10 enrichR_3.1
#> [3] DT_0.27 htmlwidgets_1.5.4
#> [5] kableExtra_1.3.4 scales_1.2.1
#> [7] cowplot_1.1.1 plotly_4.10.1
#> [9] TissueEnrich_1.18.0 GSEABase_1.60.0
#> [11] graph_1.76.0 annotate_1.76.0
#> [13] XML_3.99-0.12 AnnotationDbi_1.60.0
#> [15] ensurer_1.1 biomaRt_2.54.1
#> [17] reshape2_1.4.4 RColorBrewer_1.1-3
#> [19] ggrepel_0.9.3 pheatmap_1.0.12
#> [21] vsn_3.66.0 DESeq2_1.38.3
#> [23] SummarizedExperiment_1.28.0 Biobase_2.58.0
#> [25] MatrixGenerics_1.10.0 matrixStats_0.63.0
#> [27] GenomicRanges_1.50.1 GenomeInfoDb_1.34.3
#> [29] IRanges_2.32.0 S4Vectors_0.36.0
#> [31] BiocGenerics_0.44.0 forcats_0.5.2
#> [33] stringr_1.5.0 dplyr_1.0.10
#> [35] purrr_1.0.1 readr_2.1.3
#> [37] tidyr_1.3.0 tibble_3.1.8
#> [39] ggplot2_3.4.0 tidyverse_1.3.2
#>
#> loaded via a namespace (and not attached):
#> [1] readxl_1.4.1 backports_1.4.1 BiocFileCache_2.6.1
#> [4] systemfonts_1.0.4 plyr_1.8.8 lazyeval_0.2.2
#> [7] crosstalk_1.2.0 BiocParallel_1.32.1 digest_0.6.30
#> [10] htmltools_0.5.3 fansi_1.0.3 magrittr_2.0.3
#> [13] memoise_2.0.1 googlesheets4_1.0.1 tzdb_0.3.0
#> [16] limma_3.54.0 Biostrings_2.66.0 modelr_0.1.10
#> [19] vroom_1.6.0 svglite_2.1.0 timechange_0.1.1
#> [22] rmdformats_1.0.4 prettyunits_1.1.1 colorspace_2.0-3
#> [25] blob_1.2.3 rvest_1.0.3 rappdirs_0.3.3
#> [28] haven_2.5.1 xfun_0.35 crayon_1.5.2
#> [31] RCurl_1.98-1.9 jsonlite_1.8.3 glue_1.6.2
#> [34] gtable_0.3.1 gargle_1.2.1 zlibbioc_1.44.0
#> [37] XVector_0.38.0 webshot_0.5.4 DelayedArray_0.24.0
#> [40] DBI_1.1.3 Rcpp_1.0.9 viridisLite_0.4.1
#> [43] xtable_1.8-4 progress_1.2.2 bit_4.0.5
#> [46] preprocessCore_1.60.2 httr_1.4.4 ellipsis_0.3.2
#> [49] farver_2.1.1 pkgconfig_2.0.3 sass_0.4.3
#> [52] dbplyr_2.2.1 locfit_1.5-9.7 utf8_1.2.2
#> [55] tidyselect_1.2.0 labeling_0.4.2 rlang_1.1.1
#> [58] munsell_0.5.0 cellranger_1.1.0 tools_4.2.2
#> [61] cachem_1.0.6 cli_3.4.1 generics_0.1.3
#> [64] RSQLite_2.2.18 broom_1.0.1 evaluate_0.18
#> [67] fastmap_1.1.0 yaml_2.3.6 knitr_1.41
#> [70] bit64_4.0.5 fs_1.5.2 KEGGREST_1.38.0
#> [73] xml2_1.3.3 compiler_4.2.2 rstudioapi_0.14
#> [76] filelock_1.0.2 curl_4.3.3 png_0.1-7
#> [79] affyio_1.68.0 reprex_2.0.2 geneplotter_1.76.0
#> [82] bslib_0.4.1 stringi_1.7.8 highr_0.9
#> [85] lattice_0.20-45 Matrix_1.5-3 vctrs_0.6.2
#> [88] pillar_1.8.1 lifecycle_1.0.3 BiocManager_1.30.19
#> [91] jquerylib_0.1.4 data.table_1.14.6 bitops_1.0-7
#> [94] R6_2.5.1 affy_1.76.0 bookdown_0.33
#> [97] codetools_0.2-18 assertthat_0.2.1 rjson_0.2.21
#> [100] withr_2.5.0 GenomeInfoDbData_1.2.9 parallel_4.2.2
#> [103] hms_1.1.2 rmarkdown_2.18 googledrive_2.0.0
#> [106] lubridate_1.9.0