Bulk RNAseq: Differential Expression Analysis
Environment Setup
salloc -N 1 --exclusive -p amd -t 8:00:00
# use the same env used for #1
conda activate smallrna
# working dir
mkdir -p /work/LAS/geetu-lab/arnstrm/mouse.ev.RNAseq
cd /work/LAS/geetu-lab/arnstrm/mouse.ev.RNAseq
# file structure
tree -L 1
.
├── 1_data
├── 2_fastqc
├── 3_STAR-mapping
├── 4_featureCounts
└── 5_multiqc
Raw data
Raw data was downloaded from the sequencing facility using the
rsync
command, with authentication. The downloaded files
were checked for md5sum and compared against list of files expected as
per the input samples provided.
cd 1_data
rsync -rltvPh ext-arnstrm@download.bioinformatics.missouri.edu:xyz .
# link masked
# GEO link will be included later
Genome/annotation
Additional files required for the analyses were downloaded from GenCode. The downloaded files are as follows:
cd 3_STAR-mapping
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M30/GRCm39.primary_assembly.genome.fa.gz
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M30/gencode.vM30.annotation.gff3.gz
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M30/gencode.vM30.annotation.gtf.gz
gunzip GRCm39.primary_assembly.genome.fa.gz
gunzip gencode.vM30.annotation.gff3.gz
gunzip gencode.vM30.annotation.gtf.gz
# ids for prot coding genes
awk '$3=="gene" {print $9}' gencode.vM30.annotation.gff3 |\
cut -f 1-3 -d ";" |\
sed -e 's/;/\t/g' -e 's/gene_id=//g' -e 's/ID=//g' |\
grep "gene_type=protein_coding" |\
cut -f 1 > mm10.protein_coding
FastQC
Quality inspection of the reads. The multiqc
report,
collating all samples together are provided as html file.
cd 2_fastqc
for fq in ../1_data/*.fq.gz; do
fastqc --threads $SLURM_JOB_CPUS_PER_NODE $fq;
done
Mapping
To index the genome, following command was run (in an interactive session).
fastaGenome="GRCm39.genome.fa"
gtf="gencode.vM30.annotation.gtf"
STAR --runThreadN $SLURM_JOB_CPUS_PER_NODE \
--runMode genomeGenerate \
--genomeDir $(pwd) \
--genomeFastaFiles $fastaGenome \
--sjdbGTFfile $gtf \
--sjdbOverhang 1
Each fastq
file was mapped to the indexed genome as
using runSTAR_map.sh
script shown below:
#!/bin/bash
read1=$1
read2=$(echo ${read1} | sed 's/_L003_R1_001.fastq.gz/_L003_R2_001.fastq.gz/g')
cpus=${SLURM_JOB_CPUS_PER_NODE}
out=$(basename ${read1} | sed 's/_L003_R1_001.fastq.gz//g')
index=/work/LAS/geetu-lab/arnstrm/GRCm39_index
STAR \
${cpus} \
--runThreadN ${index} \
--genomeDir \
--outSAMtype BAM SortedByCoordinate \
--quantMode GeneCounts \
--outFilterScoreMinOverLread 0.3 \
--outFilterMatchNminOverLread 0.3 ${out}_ \
--outFileNamePrefix \
--readFilesCommand zcat \
--outWigType bedGraph \
--outWigStrand Unstranded \
--outWigNorm RPM ${read1} ${read2} --readFilesIn
Mapping was run with a simple loop:
for fq in *_R1_*fastq.gz; do
runSTAR_map.sh $fq;
done
Counts
For generating counts from the mapped reads, we used
subread
package program featureCounts
. All bam
files were supplied togehter to generate a single count file for
individual samples.
cd 3_STAR-mapping
realpath *.bam > ../4_featureCounts/bam.fofn
cd ../4_featureCounts
while read line; do
ln -s $line;
done
featureCounts \
-T ${SLURM_CPUS_ON_NODE} \
-a gencode.vM30.annotation.gtf \
-t exon \
-g gene_id \
-p \
-B \
--countReadPairs \
-o merged_counts.txt \
--tmpDir ./tmp *.bam
The generated counts file was processed to use it direclty with
DESeq2
cut -f 1,7- merged_counts.txt |\
grep -v "^#" |\
sed 's/_S...\?_Aligned.sortedByCoord.out.bam//g' |\
sed 's/mNCSC_//g' > merged_counts-clean.tsv
grep -Fw -f mm10.protein_coding merged_counts-clean.tsv > body
head -n 1 merged_counts-clean.tsv > head
cat head body >> merged_counts-clean-prot.tsv
rm head body
Create a info file:
head -n 1 merged_counts-clean.tsv |\
tr "\t" "\n" |\
grep -v "^Geneid" |\
awk '{print $1"\t"$1}' |\
sed 's/_.$//g' > info.tsv
Differential expression analysis
Differential expression (DE) analyses using DESeq2
was
performed as shown below.
Prerequisites
R packages required for this section are loaded
# set path
setwd("/work/LAS/geetu-lab/arnstrm/mouse.trophoblast.smallRNAseq")
# load the modules
library(tidyverse)
library(DESeq2)
library(vsn)
library(pheatmap)
library(ggrepel)
library(RColorBrewer)
library(reshape2)
require(biomaRt)
library(TissueEnrich)
library(plotly)
library(cowplot)
library(biomaRt)
library(scales)
library(kableExtra)
library(htmlwidgets)
library(DT)
library(enrichR)
Import datasets
The counts
data and its associated metadata
(coldata
) are imported for analyses.
= 'assets/merged_counts-clean-prot.tsv'
countsFile = 'assets/info.tsv'
groupFile <-
coldata read.csv(
groupFile,row.names = 1,
sep = "\t",
header = FALSE,
stringsAsFactors = TRUE
)colnames(coldata) <- "condition"
<- as.matrix(read.delim(countsFile, row.names = 1, header = TRUE)) cts
Reorder columns of cts
according to coldata
rows. Check if samples in both files match.
all(rownames(coldata) %in% colnames(cts))
#> [1] TRUE
<- cts[, rownames(coldata)] cts
DESeq2
The batch corrected read counts are then used for running DESeq2 analyses
<- DESeqDataSetFromMatrix(countData = cts,
dds colData = coldata,
design = ~ condition)
<- vst(dds, blind = FALSE)
vsd <- rowSums(counts(dds)) >= 10
keep <- dds[keep, ]
dds <- DESeq(dds) dds
PCA plot for QC
PCA plot for the dataset that includes all libraries.
<- rowVars(assay(vsd))
rv <-
pcaData plotPCA(vsd,
intgroup = "condition",
returnData = TRUE)
<- round(100 * attr(pcaData, "percentVar"))
percenttVar1 # @ Thomas W. Battaglia
#' Plot DESeq2's PCA plotting with Plotly 3D scatterplot
#'
#' The function will generate a plot_ly 3D scatter plot image for
#' a 3D exploration of the PCA.
#'
#' @param object a DESeqTransform object, with data in assay(x), produced for example by either rlog or varianceStabilizingTransformation.
#' @param intgroup interesting groups: a character vector of names in colData(x) to use for grouping
#' @param ntop number of top genes to use for principal components, selected by highest row variance
#' @param returnData should the function only return the data.frame of PC1, PC2 and PC3 with intgroup covariates for custom plotting (default is FALSE)
#' @return An object created by plot_ly, which can be assigned and further customized.
#' @export
<-
plotPCA3D function (object,
intgroup = "condition",
ntop = 500,
returnData = FALSE) {
<- rowVars(assay(object))
rv <-
select order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
<- prcomp(t(assay(object)[select,]))
pca <- pca$sdev ^ 2 / sum(pca$sdev ^ 2)
percentVar if (!all(intgroup %in% names(colData(object)))) {
stop("the argument 'intgroup' should specify columns of colData(dds)")
}<-
intgroup.df as.data.frame(colData(object)[, intgroup, drop = FALSE])
<- if (length(intgroup) > 1) {
group factor(apply(intgroup.df, 1, paste, collapse = " : "))
}else {
colData(object)[[intgroup]]
}<- data.frame(
d PC1 = pca$x[, 1],
PC2 = pca$x[, 2],
PC3 = pca$x[, 3],
group = group,
intgroup.df,name = colnames(object)
)if (returnData) {
attr(d, "percentVar") <- percentVar[1:3]
return(d)
}message("Generating plotly plot")
<- plotly::plot_ly(
p data = d,
x = ~ PC1,
y = ~ PC2,
z = ~ PC3,
color = group,
mode = "markers",
type = "scatter3d"
)return(p)
}
<-
select order(rv, decreasing = TRUE)[seq_len(min(500, length(rv)))]
<- prcomp(t(assay(vsd)[select, ]))
pca <- pca$sdev ^ 2 / sum(pca$sdev ^ 2)
percentVar = "condition"
intgroup <- as.data.frame(colData(vsd)[, intgroup, drop = FALSE])
intgroup.df <- if (length(intgroup) == 1) {
group factor(apply(intgroup.df, 1, paste, collapse = " : "))
}<- data.frame(
d PC1 = pca$x[, 1],
PC2 = pca$x[, 2],
intgroup.df,name = colnames(vsd)
)ggplot(d, aes(PC1, PC2, color = condition)) +
scale_shape_manual(values = 1:12) +
scale_color_manual(values = c('mTSC_EV' = '#0178c7',
'No_EV' = '#7acc37',
'pTGC_EV' = '#8e27af')) +
theme_bw() +
theme(legend.title = element_blank()) +
geom_point(size = 2, stroke = 2) +
geom_text_repel(aes(label = name)) +
xlab(paste("PC1", round(percentVar[1] * 100, 2), "% variance")) +
ylab(paste("PC2", round(percentVar[2] * 100, 2), "% variance"))
Interactive 3D PCA plot
= plotPCA3D(vsd, intgroup = "condition")
g g
View the interactive 3D PCA plot in a new tab.
Sample distance for QC
<- dist(t(assay(vsd)))
sampleDists <- as.matrix( sampleDists )
sampleDistMatrix rownames(sampleDistMatrix) <- colnames(vsd)
colnames(sampleDistMatrix) <- NULL
<- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
colors pheatmap(sampleDistMatrix,
clustering_distance_rows = sampleDists,
clustering_distance_cols = sampleDists,
col = colors)
Set contrasts and find DE genes
<-
res.NoEvsTSC results(dds,
contrast = c(
"condition",
"No_EV",
"mTSC_EV"))
<-
res.NoEvsTGC results(dds,
contrast = c(
"condition",
"No_EV",
"pTGC_EV"))
Functions
Processing DE objects
<- function(res.se, string) {
processDE = 1.5
fc = log(fc, base = 2)
log2fc = log2fc * -1
neg.log2fc <- res.se[order(res.se$padj), ]
res.se <-
res.data merge(as.data.frame(res.se),
as.data.frame(counts(dds, normalized = TRUE)),
by = "row.names",
sort = FALSE)
names(res.data)[1] <- "Gene"
<-
res.up %>%
res.data filter(log2FoldChange >= log2fc) %>%
filter(padj <= 0.05) %>%
arrange(desc(log2FoldChange)) %>%
::select(Gene)
dplyr<-
res.dw %>%
res.data filter(log2FoldChange <= neg.log2fc) %>%
filter(padj <= 0.05) %>%
arrange(desc(log2FoldChange)) %>%
::select(Gene)
dplyr<-
res.up.new $ensembl_gene_id_version %in% res.up$Gene,]
annot[annot<-
res.dw.new $ensembl_gene_id_version %in% res.dw$Gene,]
annot[annot<-
res.data.info left_join(res.data, mart, by = c('Gene' = 'ensembl_gene_id_version'))
<- res.data.info %>%
res.data.filtered filter(padj <= 0.05) %>%
filter(log2FoldChange >= log2fc | log2FoldChange <= neg.log2fc) %>%
arrange(desc(log2FoldChange))
<- paste0(string, ".up.pce", 1)
pce.up1 <- paste0(string, ".dw.pce", 1)
pce.dw1 <- paste0(string, ".up.pce", 2)
pce.up2 <- paste0(string, ".dw.pce", 2)
pce.dw2 <- paste0(string, ".DE.table")
DEGtable assign(pce.up1, as.character(res.up.new$ensembl_gene_id), envir = .GlobalEnv)
assign(pce.dw1, as.character(res.dw.new$ensembl_gene_id), envir = .GlobalEnv)
assign(pce.up2, as.character(res.up.new$external_gene_name), envir = .GlobalEnv)
assign(pce.dw2, as.character(res.dw.new$external_gene_name), envir = .GlobalEnv)
assign(DEGtable, res.data.info, envir = .GlobalEnv)
# save full table
write_delim(
res.data.info,file = paste0("results/DESeq2results-", string, "_fc.tsv"),
delim = "\t"
)# save filtered table (fc = 1.5 & padj <= 0.05)
write_delim(
res.data.filtered,file = paste0("results/DE_", string, "_filtered.tsv"),
delim = "\t"
) }
Gene information
= useMart("ENSEMBL_MART_ENSEMBL")
ensembl listDatasets(ensembl) %>%
filter(str_detect(description, "Mouse"))
= useDataset("mmusculus_gene_ensembl", mart = ensembl)
ensembl listFilters(ensembl) %>%
filter(str_detect(name, "ensembl"))
<- "ensembl_gene_id_version"
filterType head(rownames(counts))
<- read.delim(countsFile, row.names = 1, header = TRUE)
counts head(rownames(counts))
<- rownames(counts)
filterValues listAttributes(ensembl) %>%
head(20)
<- c('ensembl_gene_id_version',
attributeNames 'ensembl_gene_id',
'external_gene_name')
<- getBM(
annot attributes = attributeNames,
filters = filterType,
values = filterValues,
mart = ensembl
)<- c('ensembl_gene_id_version',
attributeNames 'gene_biotype',
'external_gene_name',
'description')
<- getBM(
mart attributes = attributeNames,
filters = filterType,
values = filterValues,
mart = ensembl
)write_delim(
annot,file = "assets/annot.tsv",
delim = "\t"
)write_delim(
mart,file = "assets/mart.tsv",
delim = "\t"
)
Files were saved, so we don’t query BioMart everytime we run the markdown. The files will be loaded, instead
<-
mart read.csv(
"assets/mart.tsv",
sep = "\t",
header = TRUE,
)<-
annot read.csv(
"assets/annot.tsv",
sep = "\t",
header = TRUE,
)
Volcano plotting function
<- volcanoPlots <-
g function(res.se,
string,
first,
second,
color1,
color2,
color3,
ChartTitle) {= 1.5
fc = log(fc, base = 2)
log2fc = log2fc * -1
neg.log2fc <- res.se[order(res.se$padj),]
res.se <-
res.se rownames_to_column(as.data.frame(res.se[order(res.se$padj), ]))
names(res.se)[1] <- "Gene"
<-
res.data merge(res.se,
mart,by.x = "Gene",
by.y = "ensembl_gene_id_version")
<- res.data %>% mutate_all(na_if, "")
res.data <- res.data %>% mutate_all(na_if, " ")
res.data <-
res.data %>% mutate(external_gene_name = coalesce(external_gene_name, Gene))
res.data $diffexpressed <- "other.genes"
res.data$diffexpressed[res.data$log2FoldChange >= log2fc &
res.data$padj <= 0.05] <-
res.datapaste("Higher expression in", first)
$diffexpressed[res.data$log2FoldChange <= neg.log2fc &
res.data$padj <= 0.05] <-
res.datapaste("Higher expression in", second)
<- res.data %>%
upgenes ::filter(log2FoldChange >= log2fc & padj <= 0.05) %>%
dplyrarrange(desc(log2FoldChange)) %>%
mutate(delabel = external_gene_name) %>%
select(Gene, delabel) %>%
top_n(10)
<- res.data %>%
downgenes ::filter(log2FoldChange <= neg.log2fc &
dplyr<= 0.05) %>%
padj arrange(desc(log2FoldChange)) %>%
mutate(delabel = external_gene_name) %>%
select(Gene, delabel) %>%
top_n(10)
<- rbind(upgenes, downgenes)
fullgenes <- left_join(res.data, fullgenes, by = "Gene")
res.data $delabel[res.data$external_gene_name == "Htr2b"] <-
res.data"Htr2b"
ggplot(res.data,
aes(
x = log2FoldChange,
y = -log10(padj),
col = diffexpressed,
label = delabel
+
)) geom_point(alpha = 0.5) +
xlim(-2.5, 2.5) +
theme_classic() +
scale_color_manual(name = "Expression", values = c(color1, color3, color2)) +
geom_label_repel(
data = res.data,
aes(size = 0.5, point.size = 0.5),
max.overlaps = Inf,
force_pull = 0,
min.segment.length = 0.5,
show.legend = F,
seed = 11,
box.padding = 0.5
+
) ggtitle(ChartTitle) +
xlab(paste("log2 fold change")) +
ylab("-log10 pvalue (adjusted)") +
theme(legend.text.align = 0)
}
GenePlot function
<- function(geneName="ENSMUSG00000026228.7") {
gnCountFn <- as.data.frame(counts(dds, normalized = TRUE)[geneName, ])
geneCounts colnames(geneCounts) <- "normalizedCounts"
<- geneCounts %>%
geneCounts rownames_to_column("condition") %>%
mutate(condition = str_sub(condition, 1, str_length(condition)-2))
$condition <- factor(geneCounts$condition,
geneCountslevels=c('No_EV', 'mTSC_EV', 'pTGC_EV'))
<- ggplot(geneCounts, aes(x = condition, y = normalizedCounts, fill = condition)) +
g geom_boxplot(color = "black") +
stat_summary(
fun.y = mean,
geom = "point",
shape = 23,
size = 2
+
) geom_dotplot(
binaxis = 'y',
binwidth = 1,
stackdir = 'center',
dotsize = 0.75
+
) scale_fill_manual(values = c(
"No_EV" = "#7acc37",
"mTSC_EV" = "#0178c7",
"pTGC_EV" = "#8e27af"
+
)) theme_clean()
g }
TissueEnrich function
source("assets/theme_clean.R")
<- function(inputGenes = gene.list,
plotTE myColor = "color") {
<-
gs GeneSet(geneIds = inputGenes,
organism = "Mus Musculus",
geneIdType = SymbolIdentifier())
<- teEnrichment(inputGenes = gs, rnaSeqDataset = 3)
output <-
en.output setNames(data.frame(assay(output[[1]]),
row.names = rowData(output[[1]])[, 1]),
colData(output[[1]])[, 1])
$Tissue <- rownames(en.output)
en.output<- -log10(0.05)
logp <-
en.output mutate(en.output,
significance = ifelse(Log10PValue > logp,
"colored", "nocolor"))
$Sig <- "NA"
en.outputggplot(en.output, aes(reorder(Tissue, Log10PValue),
Log10PValue,fill = significance)) +
geom_bar(stat = 'identity') +
theme_clean() + ylab("- log10 adj. p-value") + xlab("") +
scale_fill_manual(values = c("colored" = myColor, "nocolor" = "gray")) +
scale_y_continuous(expand = expansion(mult = c(0, .1)),
breaks = scales::pretty_breaks()) +
coord_flip()
}
enrichR function
<- function(enriched, table="string", myColor = "slateblue") {
plotEnrichR <- -log10(0.05)
logp <- enriched[[table]]
myData $negLogP <- -log10(myData$P.value)
myData<-
myData mutate(myData,
significance = ifelse(negLogP > logp, "colored", "nocolor"))
$Sig <- "NA"
myData<- head(arrange(myData, -negLogP, Term), 15)
myData ggplot(myData, aes(reorder(Term, negLogP),
negLogP,fill = significance)) +
geom_bar(stat = 'identity') +
theme_clean() + ylab("- log10 p-value") + xlab("") +
scale_fill_manual(values = c("colored" = myColor, "nocolor" = "gray")) +
scale_y_continuous(expand = expansion(mult = c(0, .1)),
breaks = scales::pretty_breaks()) +
coord_flip()
}
Results
Write files
processDE(res.NoEvsTSC, "NoEvsTSC")
processDE(res.NoEvsTGC, "NoEvsTGC")
Htr2b gene expression
gnCountFn(geneName="ENSMUSG00000026228.7")
Volcano plots
no_EVs vs. pTGC_EV
<- volcanoPlots(
g
res.NoEvsTGC,"NoEvsTGC",
"No_EVs",
"pTGC_EVs",
"#7acc37",
"#4d4d4d",
"#8e27af",
ChartTitle = "No_EVs vs. pTGC_EVs"
)
g#> Warning: Removed 2397 rows containing missing values (`geom_point()`).
#> Warning: Removed 15477 rows containing missing values (`geom_label_repel()`).
No_EVs vs. mTSC_EV
<- volcanoPlots(
g
res.NoEvsTSC,"NoEvsTSC",
"No_EVs",
"mTSC_EV",
"#0178c7",
"#4d4d4d",
"#7acc37",
ChartTitle = "No_EVs vs. mTSC_EV"
)
g#> Warning: Removed 2397 rows containing missing values (`geom_point()`).
#> Warning: Removed 15477 rows containing missing values (`geom_label_repel()`).
Overlap of genes TSC EV (vs. no EV) and pTGC EV (vs. no EV)
All significant
library(ggvenn)
<- list(pTGC = c(NoEvsTGC.dw.pce1,NoEvsTGC.up.pce1),
y TSC = c(NoEvsTSC.dw.pce1, NoEvsTSC.up.pce1))
ggvenn(
y,fill_color = c("#8e27af", "#0178c7"),
stroke_size = 0.3,
set_name_size = 4
)
upregulated in pTGC or TSC
library(ggvenn)
<- list(pTGC = NoEvsTGC.dw.pce1, TSC = NoEvsTSC.dw.pce1)
y ggvenn(
y,fill_color = c("#8e27af", "#0178c7"),
stroke_size = 0.3,
set_name_size = 4
)
upregulated in no EV
library(ggvenn)
<- list(pTGC = NoEvsTGC.up.pce1, TSC = NoEvsTSC.up.pce1)
y ggvenn(
y,fill_color = c("#8e27af", "#0178c7"),
stroke_size = 0.3,
set_name_size = 4
)
TE: No_EV vs. pTGC
up in NoEvs
plotTE(unique(NoEvsTGC.up.pce2), "#7acc37")
up in pTGC
plotTE(unique(NoEvsTGC.dw.pce2), "#8e27af")
TE: No_EV vs. mTSC
up in NoEvs
plotTE(unique(NoEvsTSC.up.pce2), "#7acc37")
up in mTSC
plotTE(unique(NoEvsTSC.dw.pce2), "#0178c7")
Enrichment Analyses
setEnrichrSite("Enrichr")
<- TRUE
websiteLive <-
myDBs c(
"DisGeNET",
"WikiPathways_2019_Human",
"WikiPathways_2019_Mouse",
"KEGG_2019_Mouse"
)if (websiteLive) {
<- enrichr(NoEvsTGC.up.pce2, myDBs)
NoEvsTGC.up.enriched Sys.sleep(60)
<- enrichr(NoEvsTGC.dw.pce2, myDBs)
NoEvsTGC.dw.enriched Sys.sleep(60)
<- enrichr(NoEvsTSC.up.pce2, myDBs)
NoEvsTSC.up.enriched Sys.sleep(60)
<- enrichr(NoEvsTSC.dw.pce2, myDBs)
NoEvsTSC.dw.enriched
}#> Uploading data to Enrichr... Done.
#> Querying DisGeNET... Done.
#> Querying WikiPathways_2019_Human... Done.
#> Querying WikiPathways_2019_Mouse... Done.
#> Querying KEGG_2019_Mouse... Done.
#> Parsing results... Done.
#> Uploading data to Enrichr... Done.
#> Querying DisGeNET... Done.
#> Querying WikiPathways_2019_Human... Done.
#> Querying WikiPathways_2019_Mouse... Done.
#> Querying KEGG_2019_Mouse... Done.
#> Parsing results... Done.
#> Uploading data to Enrichr... Done.
#> Querying DisGeNET... Done.
#> Querying WikiPathways_2019_Human... Done.
#> Querying WikiPathways_2019_Mouse... Done.
#> Querying KEGG_2019_Mouse... Done.
#> Parsing results... Done.
#> Uploading data to Enrichr... Done.
#> Querying DisGeNET... Done.
#> Querying WikiPathways_2019_Human... Done.
#> Querying WikiPathways_2019_Mouse... Done.
#> Querying KEGG_2019_Mouse... Done.
#> Parsing results... Done.
up in NoEvs (No_EV vs. pTGC)
DisGeNET
plotEnrichR(NoEvsTGC.up.enriched, table="DisGeNET" , myColor = "#7acc37")
WikiPathways_2019_Human
plotEnrichR(NoEvsTGC.up.enriched, table="WikiPathways_2019_Human" , "#7acc37")
WikiPathways_2019_Mouse
plotEnrichR(NoEvsTGC.up.enriched, table="WikiPathways_2019_Mouse" , "#7acc37")
KEGG_2019_Mouse
plotEnrichR(NoEvsTGC.up.enriched, table="KEGG_2019_Mouse" , "#7acc37")
up in pTGC (No_EV vs. pTGC)
DisGeNET
plotEnrichR(NoEvsTGC.dw.enriched, table="DisGeNET" , myColor = "#8e27af")
WikiPathways_2019_Human
plotEnrichR(NoEvsTGC.dw.enriched, table="WikiPathways_2019_Human" , "#8e27af")
WikiPathways_2019_Mouse
plotEnrichR(NoEvsTGC.dw.enriched, table="WikiPathways_2019_Mouse" , "#8e27af")
KEGG_2019_Mouse
plotEnrichR(NoEvsTGC.dw.enriched, table="KEGG_2019_Mouse" , "#8e27af")
up in NoEvs (No_EV vs. mTSC)
DisGeNET
plotEnrichR(NoEvsTSC.up.enriched, table="DisGeNET" , myColor = "#7acc37")
WikiPathways_2019_Human
plotEnrichR(NoEvsTSC.up.enriched, table="WikiPathways_2019_Human" , "#7acc37")
WikiPathways_2019_Mouse
plotEnrichR(NoEvsTGC.up.enriched, table="WikiPathways_2019_Mouse" , "#7acc37")
KEGG_2019_Mouse
plotEnrichR(NoEvsTGC.up.enriched, table="KEGG_2019_Mouse" , "#7acc37")
up in mTSC (No_EV vs. mTSC)
DisGeNET
plotEnrichR(NoEvsTSC.dw.enriched, table="DisGeNET" , myColor = "#0178c7")
WikiPathways_2019_Human
plotEnrichR(NoEvsTSC.dw.enriched, table="WikiPathways_2019_Human" , "#0178c7")
WikiPathways_2019_Mouse
plotEnrichR(NoEvsTSC.dw.enriched, table="WikiPathways_2019_Mouse" , "#0178c7")
KEGG_2019_Mouse
plotEnrichR(NoEvsTSC.dw.enriched, table="KEGG_2019_Mouse" , "#0178c7")
Overlap analyses
For the top quartile (expression) miRNAs, targets were predicted in the previous sheet. Here we will import them (both target gene lists as well as brain specific target gene lists) and perform overlap analyses with the bulk RNAseq results.
<- readRDS("results/brainGeneLists.rds")
brainGeneLists <- readRDS("results/targetsLists_75pc.rds") targetLists
<- function(deTable = NoEvsTSC.DE.table,
mergeTargetsWithDEG targetTable = targetLists$miRNAs_75pc_pTGC,
brainGenes = brainGeneLists$brEnrTargets_75pc_pTGC) {
# thresholds
= 1.5
fc = log(fc, base = 2)
log2fc = log2fc * -1
neg.log2fc # make clean DE table
<- deTable %>%
cleanDE ::select(Gene, log2FoldChange, padj, external_gene_name, description) %>%
dplyrmutate(Gene = gsub("\\..*", "", Gene))
# make clean brain gene list
<- as.data.frame(brainGenes) %>%
brainPAV rename(SYMBOL = 1) %>%
mutate(SYMBOL = toupper(SYMBOL), BrainSpecific = 1)
# merge targets with clean tables
<-
mergedTable %>%
targetTable left_join(cleanDE, by = c("ENSEMBL" = "Gene")) %>%
mutate(SYMBOL = toupper(SYMBOL)) %>%
mutate(RNAseq = case_when((log2FoldChange > log2fc &
<= 0.05) ~ 'up',
padj < neg.log2fc &
(log2FoldChange <= 0.05) ~ 'down',
padj TRUE ~ 'NA'
%>%
)) ::select(mirbase_id, ENSEMBL, SYMBOL, RNAseq, log2FoldChange, padj, description) %>%
dplyrleft_join(as_tibble(brainPAV), by = "SYMBOL") %>%
filter(mirbase_id != "") %>% distinct()
mergedTable }
<-
createSummaryTable function(mergedTabe = NoEvsTSC.DE_with_pTGC.75pc_targets) {
<- mergedTabe %>%
t1 group_by(mirbase_id, RNAseq) %>%
summarise(BrainIntersecting = sum(!is.na(BrainSpecific))) %>%
filter(mirbase_id != "") %>%
group_by(mirbase_id) %>%
spread(RNAseq, BrainIntersecting) %>%
select(mirbase_id, down, up, `NA`) %>%
rename(brainDownDE = down,
brainUpDE = up,
Brain.NA = `NA`) %>%
mutate(totalBrain = sum(brainDownDE,
Brain.NA,
brainUpDE, na.rm =TRUE))
<- mergedTabe %>%
t2 group_by(mirbase_id, RNAseq) %>%
summarise(AllIntersecting = n()) %>%
filter(mirbase_id != "") %>%
group_by(mirbase_id) %>%
spread(RNAseq, AllIntersecting) %>%
select(mirbase_id, down, up, `NA`) %>%
rename(
TargetsDownDE = down,
TargetsUpDE = up,
TargetsNA = `NA`
%>%
) mutate(totalTargets = sum(TargetsDownDE,
TargetsUpDE,
TargetsNA, na.rm =TRUE))
<- inner_join(t1, t2, by = "mirbase_id")
summaryTable
summaryTable }
.75pc_targets <-
NoEvsTSC.DE_with_mTSCmergeTargetsWithDEG(
deTable = NoEvsTSC.DE.table,
targetTable = targetLists$miRNAs_75pc_mTSC,
brainGenes = brainGeneLists$brEnrTargets_75pc_mTSC
).75pc_summary <-
NoEvsTSC.DE_with_mTSCcreateSummaryTable(NoEvsTSC.DE_with_mTSC.75pc_targets)
write_delim(
.75pc_targets ,
NoEvsTSC.DE_with_mTSC"results/NoEvsTSC.DE_with_mTSC.75pc_targets.tsv",
delim = "\t"
)write_delim(
.75pc_summary,
NoEvsTSC.DE_with_mTSC"results/NoEvsTSC.DE_with_mTSC.75pc_summary.tsv",
delim = "\t"
)
.75pc_targets <-
NoEvsTGC.DE_with_pTGCmergeTargetsWithDEG(
deTable = NoEvsTGC.DE.table,
targetTable = targetLists$miRNAs_75pc_pTGC,
brainGenes = brainGeneLists$brEnrTargets_75pc_pTGC
).75pc_summary <-
NoEvsTGC.DE_with_pTGCcreateSummaryTable(NoEvsTGC.DE_with_pTGC.75pc_targets)
write_delim(
.75pc_targets,
NoEvsTGC.DE_with_pTGC"results/NoEvsTGC.DE_with_pTGC.75pc_targets.tsv",
delim = "\t"
)write_delim(
.75pc_summary,
NoEvsTGC.DE_with_pTGC"results/NoEvsTGC.DE_with_pTGC.75pc_summary.tsv",
delim = "\t"
)
MultiQC report
MultiQC report is available at this link
Session Information
sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] grid stats4 stats graphics grDevices utils datasets
#> [8] methods base
#>
#> other attached packages:
#> [1] ggvenn_0.1.10 enrichR_3.1
#> [3] DT_0.27 htmlwidgets_1.5.4
#> [5] kableExtra_1.3.4 scales_1.2.1
#> [7] cowplot_1.1.1 plotly_4.10.1
#> [9] TissueEnrich_1.18.0 GSEABase_1.60.0
#> [11] graph_1.76.0 annotate_1.76.0
#> [13] XML_3.99-0.12 AnnotationDbi_1.60.0
#> [15] ensurer_1.1 biomaRt_2.54.1
#> [17] reshape2_1.4.4 RColorBrewer_1.1-3
#> [19] ggrepel_0.9.3 pheatmap_1.0.12
#> [21] vsn_3.66.0 DESeq2_1.38.3
#> [23] SummarizedExperiment_1.28.0 Biobase_2.58.0
#> [25] MatrixGenerics_1.10.0 matrixStats_0.63.0
#> [27] GenomicRanges_1.50.1 GenomeInfoDb_1.34.3
#> [29] IRanges_2.32.0 S4Vectors_0.36.0
#> [31] BiocGenerics_0.44.0 forcats_0.5.2
#> [33] stringr_1.5.0 dplyr_1.0.10
#> [35] purrr_1.0.1 readr_2.1.3
#> [37] tidyr_1.3.0 tibble_3.1.8
#> [39] ggplot2_3.4.0 tidyverse_1.3.2
#>
#> loaded via a namespace (and not attached):
#> [1] readxl_1.4.1 backports_1.4.1 BiocFileCache_2.6.1
#> [4] systemfonts_1.0.4 plyr_1.8.8 lazyeval_0.2.2
#> [7] crosstalk_1.2.0 BiocParallel_1.32.1 digest_0.6.30
#> [10] htmltools_0.5.3 fansi_1.0.3 magrittr_2.0.3
#> [13] memoise_2.0.1 googlesheets4_1.0.1 tzdb_0.3.0
#> [16] limma_3.54.0 Biostrings_2.66.0 modelr_0.1.10
#> [19] vroom_1.6.0 svglite_2.1.0 timechange_0.1.1
#> [22] rmdformats_1.0.4 prettyunits_1.1.1 colorspace_2.0-3
#> [25] blob_1.2.3 rvest_1.0.3 rappdirs_0.3.3
#> [28] haven_2.5.1 xfun_0.35 crayon_1.5.2
#> [31] RCurl_1.98-1.9 jsonlite_1.8.3 glue_1.6.2
#> [34] gtable_0.3.1 gargle_1.2.1 zlibbioc_1.44.0
#> [37] XVector_0.38.0 webshot_0.5.4 DelayedArray_0.24.0
#> [40] DBI_1.1.3 Rcpp_1.0.9 viridisLite_0.4.1
#> [43] xtable_1.8-4 progress_1.2.2 bit_4.0.5
#> [46] preprocessCore_1.60.2 httr_1.4.4 ellipsis_0.3.2
#> [49] farver_2.1.1 pkgconfig_2.0.3 sass_0.4.3
#> [52] dbplyr_2.2.1 locfit_1.5-9.7 utf8_1.2.2
#> [55] tidyselect_1.2.0 labeling_0.4.2 rlang_1.1.1
#> [58] munsell_0.5.0 cellranger_1.1.0 tools_4.2.2
#> [61] cachem_1.0.6 cli_3.4.1 generics_0.1.3
#> [64] RSQLite_2.2.18 broom_1.0.1 evaluate_0.18
#> [67] fastmap_1.1.0 yaml_2.3.6 knitr_1.41
#> [70] bit64_4.0.5 fs_1.5.2 KEGGREST_1.38.0
#> [73] xml2_1.3.3 compiler_4.2.2 rstudioapi_0.14
#> [76] filelock_1.0.2 curl_4.3.3 png_0.1-7
#> [79] affyio_1.68.0 reprex_2.0.2 geneplotter_1.76.0
#> [82] bslib_0.4.1 stringi_1.7.8 highr_0.9
#> [85] lattice_0.20-45 Matrix_1.5-3 vctrs_0.6.2
#> [88] pillar_1.8.1 lifecycle_1.0.3 BiocManager_1.30.19
#> [91] jquerylib_0.1.4 data.table_1.14.6 bitops_1.0-7
#> [94] R6_2.5.1 affy_1.76.0 bookdown_0.33
#> [97] codetools_0.2-18 assertthat_0.2.1 rjson_0.2.21
#> [100] withr_2.5.0 GenomeInfoDbData_1.2.9 parallel_4.2.2
#> [103] hms_1.1.2 rmarkdown_2.18 googledrive_2.0.0
#> [106] lubridate_1.9.0