Single-cell transcriptomes were generated using a commercially available 384-well plate approach (SORT-Seq2, Single Cell Discoveries, SCD (Muraro et al,
2016)). To this end, primary tumors were dissected and dissociated into single cells as for establishment of organoid lines from tumors. After erythrocyte lysis in ACK buffer, cells were incubated with Fc Block and antibodies (concentrations listed in Appendix Table
S2) in PBS/2% FCS/2 mM EDTA. Single viable cells (eFluor-) were gated for EPCAM+ or CD45+ or double negative cells. EPCAM−, CD45− cells were further gated for CD31 expression and triple-negative cells were considered fibroblasts for sequencing. Cells were index-sorted into 384-well capture plates containing 50 µL lysis buffer and barcoded primers covered by 10 µL of mineral oil by flow cytometry using a BD FACSAria Fusion sorter (BD biosciences) and capture plates were sent for paired-end sequencing at SCD (Illumina Nextseq™ 500). Sequences from read 1 were used for assigning reads to cells and libraries, whereas read 2 was aligned to the ensemble transcriptome (genome assembly GRCm38) using
bwa version 0.7.10 (Li and Durbin,
2010). The transcript count table was generated by SCD using a custom-written script (
https://github.com/anna-alemany/transcriptomics/tree/master/mapandgo). Transcript counts and metadata for all samples were imported and stored as a single-cell experiment object (
SingleCell-Experiment (Amezquita et al,
2020) version 1.12.0). During quality control, cells with high mitochondrial content (
isOutlier, scater (McCarthy et al,
2017) version 1.18.5) were discarded to remove low-quality cells that may have been damaged during processing or may not have been fully captured by the sequencing protocol. For analysis of CAFs, cells with high expression of
Ptprc and
Epcam genes were excluded to avoid contamination by residual immune/epithelial cells. The samples were normalized by computing the log
2-transformed normalized expression values across all genes for each cell (
logNormCounts, scater). Next, all batches were subset to the common features across all samples to enable downstream analysis. To account for the differences in samples due to plates (“batch effect”), we used the fastMNN algorithm (
correctExperiments, batchelor (Haghverdi et al,
2018) version 1.6.2). Subsequently, clusters were identified using a graph-based approach (
buildSNNGraph with k set to 15, scran (Lun et al,
2016) version 1.18.5) and the walktrap algorithm (igraph version 1.2.6). Cluster-specific marker genes were identified using the
findMarkers function (scran) that uniquely define one cluster against the rest. Functional enrichment analysis in Metascape (
https://metascape.org/) and Enrichr (
https://maayanlab.cloud/Enrichr/) was applied to identify pathways and processes that were enriched in each cluster based on differentially expressed genes (FDR ≤ 0.1;
P ≤ 0.05). Furthermore, expression profiles of identified clusters were compared with those previously published for CAF subtypes using the
AddModuleScore function (Tirosh et al,
2016) from Seurat (Hao et al,
2021) version 4.0.0. To this end, each cell was assigned a score using the module of genes associated with the “published clusters”. A positive score suggested that this module of genes is expressed in a particular cell more than would be expected, given the average expression of this module across the population. A mean score for cluster-specific cells was calculated to obtain scores per cluster for each module of genes associated with “published clusters”. Using a reference dataset with known labels, the SingleR (Aran et al,
2019) approach (version 1.4.1) labels new cells from a test dataset based on similarity to the reference. Thereby, we assessed the similarity between clusters from Fib
ΔZeb1 and Fib
Ctrl, where Fib
Ctrl was set as a reference. For the scRNA-seq dataset from non-inflammation-driven orthotopic tumors, lower quality clusters with indistinct or ambiguous cell-type identities were excluded by further sub-clustering.
The DAseq tool (Zhao et al,
2021) was used to study the differential abundance of cells from AOM/DSS scRNA Fib
Ctrl and Fib
ΔZeb1, using read counts after basic quality control filtering implemented in the standard Seurat (Hao et al,
2021) workflow. Cells with 100–6000 detected genes and less than 10% mitochondrial counts were included. The entire dataset which includes Fib
Ctrl and Fib
ΔZeb1 samples, was used to normalize the data, identify the highly variable genes, scale the data, compute the PCA, use nearest neighbor embedding, integrate the batches using harmony and recluster the data. Cell phenotypes were assigned by using Seurat’s
AddModuleScore and published gene sets (Bartoschek et al,
2018; Elyada et al,
2019). DAseq was applied to identify differential abundance (Zhao et al,
2021) and three differentially abundant cell populations were identified. One contained almost exclusively cells from one SORT-Seq plate and was therefore neglected. Statistical significance is reported in the respective figures or figure legends. Characteristic genes for the two other DA populations were identified using the marker gene function included in the DAseq package, which uses stochastic gates.