Basal–epithelial subpopulations underlie and predict chemotherapy resistance in triple-negative breast cancer

Triple-negative breast cancer (TNBC) is the most aggressive breast cancer subtype, characterized by extensive intratumoral heterogeneity, high metastasis, and chemoresistance, leading to poor clinical outcomes. Despite progress, the mechanistic basis of these aggressive behaviors remains poorly understood. Using single-cell and spatial transcriptome analysis, here we discovered basal epithelial subpopulations located within the stroma that exhibit chemoresistance characteristics. The subpopulations are defined by distinct signature genes that show a frequent gain in copy number and exhibit an activated epithelial-to-mesenchymal transition program. A subset of these genes can accurately predict chemotherapy response and are associated with poor prognosis. Interestingly, among these genes, elevated ITGB1 participates in enhancing intercellular signaling while ACTN1 confers a survival advantage to foster chemoresistance. Furthermore, by subjecting the transcriptional signatures to drug repurposing analysis, we find that chemoresistant tumors may benefit from distinct inhibitors in treatment-naive versus post-NAC patients. These findings shed light on the mechanistic basis of chemoresistance while providing the best-in-class biomarker to predict chemotherapy response and alternate therapeutic avenues for improved management of TNBC patients resistant to chemotherapy.

Thank you for the submission of your manuscript to EMBO Molecular Medicine, and please accept my apologies for the delay in getting back to you in this busy time of the year.As you will see below, the referee who was consulted on your revised manuscript, previous reports and rebuttal letter is supportive of publication pending additional clarifications, and I am therefore pleased to inform you that we will be able to accept your manuscript once the following points will be addressed: 1/ Please address the minor comments and questions from the referee and provide a .docxformatted letter INCLUDING the reviewer's report and your detailed point-by-point responses to his/her comments.As part of the EMBO Press transparent editorial process, the point-by-point response is part of the Review Process File (RPF), which will be published alongside your paper 2/ Manuscript text: -Please provide a .docxformatted version of the manuscript text (including legends for main figures, EV figures and tables).Please make sure that the changes are highlighted to be clearly visible.
-Please remove 'Data not shown ' (p13).As per our guidelines on "Unpublished Data", the journal does not permit citation of "Data not shown".All data referred to in the paper should be displayed in the main or Expanded View figures.
-The Methods section should be renamed "Materials and Methods".Please make sure that the information provided matches the author checklist.
-It is mandatory to include a 'Data Availability' section after the Materials and Methods.Before submitting your revision, primary datasets produced in this study need to be deposited in an appropriate public database, and the accession numbers and database listed under 'Data Availability'.Please remember to provide a reviewer password if the datasets are not yet public (see https://www.embopress.org/page/journal/17574684/authorguide#dataavailability).Note that the Data Availability Section is restricted to new primary data that are part of this study.
-Acknowledgements: Please make sure that the funding information provided in the manuscript matches the information provided in the submission system.
-Author contributions: CRediT has replaced the traditional author contributions section because it offers a systematic machinereadable author contributions format that allows for more effective research assessment.Please remove the Authors Contributions from the manuscript and use the free text boxes beneath each contributing author's name in our system to add specific details on the author's contribution.More information is available in our guide to authors.
-Please rename "Competing interests" to "Disclosure statement and competing interests": We updated our journal's competing interests policy in January 2022 and request authors to consider both actual and perceived competing interests.Please review the policy https://www.embopress.org/competing-interests and update your competing interests if necessary.
-Please reformat the references to have them in alphabetical order, with 10 authors listed before et al.
-Our journal encourages inclusion of *data citations in the reference list* to directly cite datasets that were re-used and obtained from public databases.Data citations in the article text are distinct from normal bibliographical citations and should directly link to the database records from which the data can be accessed.In the main text, data citations are formatted as follows: "Data ref: Smith et al, 2001" or "Data ref: NCBI Sequence Read Archive PRJNA342805, 2017".In the Reference list, data citations must be labeled with "[DATASET]".A data reference must provide the database name, accession number/identifiers and a resolvable link to the landing page from which the data can be accessed at the end of the reference.Further instructions are available at .4/ At EMBO Press we ask authors to provide source data for the main figures.Our source data coordinator will contact you to discuss which figure panels we would need source data for and will also provide you with helpful tips on how to upload and organize the files.5/ Please provide a complete author checklist, which you can download from our author guidelines (https://www.embopress.org/page/journal/17574684/authorguide#submissionofrevisions).Please insert information in the checklist that is also reflected in the manuscript.The completed author checklist will also be part of the RPF.
6/ Please provide 'The paper explained' section: EMBO Molecular Medicine articles are accompanied by a summary of the articles to emphasize the major findings in the paper and their medical implications for the non-specialist reader.Please provide a draft summary of your article highlighting -the medical issue you are addressing, -the results obtained and -their clinical impact.This may be edited to ensure that readers understand the significance and context of the research.Please refer to any of our published articles for an example.7/ Every published paper now includes a 'Synopsis' to further enhance discoverability.Synopses are displayed on the journal webpage and are freely accessible to all readers.They include a short stand first (maximum of 300 characters, including space) as well as 2-5 one-sentences bullet points that summarizes the paper.Please write the bullet points to summarize the key NEW findings.They should be designed to be complementary to the abstract -i.e.not repeat the same text.We encourage inclusion of key acronyms and quantitative information (maximum of 30 words / bullet point).Please use the passive voice.Please attach these in a separate file or send them by email, we will incorporate them accordingly.
Please also suggest a striking image or visual abstract to illustrate your article as a PNG file 550 px wide x 300-600 px high.8/ As part of the EMBO Publications transparent editorial process initiative (see our Editorial at http://embomolmed.embopress.org/content/2/9/329),EMBO Molecular Medicine will publish online a Review Process File (RPF) to accompany accepted manuscripts.This file will be published in conjunction with your paper and will include the anonymous referee reports, your point-by-point response and all pertinent correspondence relating to the manuscript.Let us know whether you agree with the publication of the RPF and as here, if you want to remove or not any figures from it prior to publication.Please note that the Authors checklist will be published at the end of the RPF.A suitable regression model chosen to select genes for chemotherapy resistance prediction with potential clinical implications.This is the most interesting and original outcome of the paper.Parts of the paper discuss extensively EMT-states in chemoresistance, which is , in my opinion, not so novel.
1. Figure 1B: This figure represents a combination of 3 scRNA-seq datasets from TNBC patients.Which of the cells shown are actually tumor cells?Apart from the clusters which are clearly TME or infiltration (T cells, stroma...) to me the big cluster of luminal epithelial cells in TNBC is quite surprising.My current interpretation is that the Luminal and Basel cells are cancer cells scoring high in the respective markers, but is that correct?CNV inference (perhaps through inferCNV R package) could help show which cells are indeed modified cancer cells, or annotation from the source papers.2. Figure 1B: How did the Authors correct for batch effect?This is a rather important detail that should be in the Methods.3. Signature of 101 genes: The Authors decided to overlap marker genes in all three datasets separately and focus on the Referee #1 (Remarks for Author): In the presented manuscript, the Authors re-analyze published datasets of single-cell RNA sequencing of triple-negative breast cancer (TNBC) patients and find that basal epithelial cells exhibit higher levels of chemoresistance markers.Thus, they derive a signature of 101 shared marker genes (at least across 2 out of 3 used datasets).It is then shown that this signature is enriched also in comparison of Chemoresistant vs Chemosensitive patients' single-cell data both in pre-and post-treatment stage suggesting that this signature might contain predictive potential on chemoresistance.Indeed, the Authors develop predictive model based on regression with elastic net regularization and narrow down the signature to 20 genes which their model uses to predict the chemotherapy response.The regression model is than benchmark against other published predictors and it is shown is has superior performance.The manuscript contains a large number of results and the Authors integrated many different datasets to prove their points.This is, on one hand, impressive; on the other hand, it is not always easy for the reader to follow the "storyline".I have a couple of remarks and questions, that will hopefully help clarify the flow of the paper.shared markers.Once the datasets have been integrated (in Fig 1B), wouldn't it be more straightforward to define markers directly form the Basal-like cluster in Fig 1B ?Could the Authors comment on that? 4. Figure 1E shows that the population of Basel epithelial cells is uniquely located at the border between the tumor and TME, but the signature is expressed across the whole fibrous region.If the signature is specific, shouldn't its expression correspond to the locations similar to those in the middle panel? 5. Figure 2A and B: Why in this data the Authors do not annotate the cells like in Fig 1A ?For example, the significant difference between Chemores.And Chemosens.In Fig2A in the posttreatment setting is clearly driven by a subset of the cells (forming the top "bumb"in the violin plots).These might be, again, the basal-epithelial cells perhaps.6. Figure 3E shows the expression of the EMT markers in the form of Z-score I guess but the Authors refer to the plot as "correlation matrix".This should be clarified.Also, the hierarchical clustering of the samples (vertical dimension) should be shown since by that means the samples are stratified to form the plot Fig 3F .7. Fig 4B : The selected subtypes are significantly upregulated compared to Low_CIN?The statistics in not shown.8. Figure 5B: Did the Authors test the capacity of their predictive model also with even more reduced gene set?Naively, looking into the score table, the genes 15 to 20 are less informative than ITGB1 alone.How is actually the Score linked to the LASSO coefficient plot in Fig S3E which depicts the feature selection upon attenuation of the lambda-regulator strength?I personally like the regression model that the Authors have chosen to build the predictor, I believe it is a suitable choice given the sizes of the training data and number of putative predictors.I also appreciate the benchmarking section.16th Jan 2024 1st Authors' Response to Reviewers derived from Karaayvaz et al, published in Nature communication (Karaayvaz et al, 2018) .The remaining two other scRNA datasets (Chung et al, 2017;Gulati et al, 2020) were processed and analyzed separately (shown in Expanded figure EV1A-C).Our analysis showed that the basal epithelial cells expressed highest levels of genes known to be associated with tumor aggressiveness including metastasis and chemoresistance (Fig. 1C of submitted and revised manuscript).We therefore took the genes that overlapped between the basal cells of the three datasets as signature genes for further downstream analysis.
Regarding the observation of a big cluster of luminal epithelial cells, it is in line with previous studies including the source study where most cells were found to be of epithelial identity (Figures 1C and 2A

b t-SNE plot of the 244 non-epithelial cells, demonstrating separation by cell type, and no distinguishable patient effect. c t-SNE plot of the 868 epithelial cells, showing mixed separation by patient, and substantial clustering of cells from different patients, suggesting pronounced intra-tumor heterogeneity. d Inferred CNVs from the single-cell gene expression data. Columns represent individual cells, and rows represent a selected set of genes, arranged according to their genomic coordinates (chromosome number indicated at left). A set of 240 normal mammary epithelial cells is shown on the left for comparison, and epithelial cells from all TNBC cases are shown, clustered separately for each patient. Amplifications (red) or deletions (blue) are inferred by computing, for each gene, a 100-gene moving average expression score, centered at the gene of interest. Prominent subclones defined by shared CNVs in tumors 39 and 81 are indicated by brackets on the top ("clonal"). e WES data for four of the six TNBC cases demonstrates high concordance with the CNV calls inferred from the transcriptomes of single cells (d). Genomic coordinates are arranged as in d from top to bottom, and mean copy number for each region ("CNV mean") is indicated on a continuous scale, with red representing gain and blue representing loss. Accordingly, scanning from left (d) to right (e) allows for a comparison of inferred CNVs (d) and actual CNVs (e) for the same regions. f Correlation map among the expression profiles of the normal epithelial cells and the TNBC epithelial cells, depicted in the same order from left to right as d. Normal cells, as well as malignant clonal subpopulations defined by shared CNVs for tumors 39 and 81 (indicated as "clonal" at top), are correlated. The remaining non-clonal epithelial populations in all tumors show relatively poor correlation, supporting their identity as malignant cells.
Prompted by the reviewer's comments, we further investigated this in our data using markers of Luminal and basal epithelial types from the source study (Fig. 3     The upper heatmap was generated by subtracting the expression profile of normal epithelial cells from the expression data of TNBC epithelial cells, highlighting differences.Regions of chromosomal amplification manifest as blocks of red, while chromosomal deletions manifest as blue blocks, providing a visual representation of the copy number changes. In addition to the above CNV heatmap, we further confirmed malignant nature of these cells by scoring each cell based on the extent of CNV signal identified through inferCNV.This scoring was derived from the number of genes exhibiting copy number alterations (CNA) in each cell, as obtained from infercnv_obj@expr.data in the inferCNV output.Subsequently, putative malignant cells were discerned based on their inferCNV scores.Lower scores signified a diminished CNV signal, while higher scores indicated a heightened CNV signal within the cells.Plotting these scores across all cells revealed a binomial distribution centered around an infercnv score of 0.2 (Figure 6A).Notably, cells scoring less than 0.2 predominantly comprised normal mammary epithelial cells, whereas cells scoring above 0.2 predominantly identified as triple-negative breast cancer (TNBC) epithelial cells (Figure 6A and B).This distribution was further dissected across each cell type, underscoring that the majority of TNBC cells, including basal epithelial cells, exhibited a heightened CNV signal compared to normal mammary epithelial cells (Figure 6C and D).These data have been included in the revised manuscript (New figure: Figure 1E).
In summary, these observations clearly demonstrate that most of the TNBC epithelial cells, including basal epithelial cells, are malignant.These findings also align with the source study (Karaayvaz et al., 2018), where the majority of epithelial cells were classified as malignant cells.We thank the reviewer for motivating us to perform these analyses as it has added new, relevant supporting data to our manuscript.
Comment 2) Figure 1B: How did the Authors correct for batch effect?This is a rather important detail that should be in the Methods.
Author's response: We agree with the reviewer that batch correction is a very critical step in such a workflow, and hence we paid serious attention to this prior to the analysis.Here we corrected the batch effect using the established canonical correlation analysis (CCA) method in Seurat and mentioned this in our Methods section as follows: "Batch effect across multiple samples were regress out and the integration of scRNA-seq datasets were done using canonical correlation analysis (CCA) method in Seurat.As a reflection of a successful batch correction, it is clearly seen that cells are clustered based on the cell type and not patient samples (Figure 7).
Here, within primary TNBC dataset (used in figure 1 of submitted manuscript), we could not see any batch effect, as we see each cluster is contributed by multiple samples and annotated as distinct cell types (Figure 7).Author's response: We thank the reviewer for this comment.We apologize for not making it clear that the data shown in Fig. 1B is from a single study and not integration of all 3 independent datasets.
In fact, we processed and analyzed all 3 datasets individually and overlapped the gene sets of their respective basal epithelial cluster to identify robust gene signatures of tumor aggressiveness.
Nevertheless, as suggested by reviewer, we have checked whether our gene signature remains intact when we perform these analysis on the integrated single-cell datasets.The clustering of cells showed batch effects as cells clustered based on the datasets (Figure 9A, left umap) which was removed using the CCA method for integration as explained in the earlier comment (Figure 9B, left umap) as cells from independent datasets contributed to each cell cluster and annotated as    the Reviewer, we have now also performed hierarchical clustering of the samples (vertical dimension) in Fig. 3E that showed a clear clustering of epithelial and mesenchymal markers (Figure 12).We thank the reviewer for these suggestions.We have updated this figure in the revised manuscript.λ varies.The axis above indicates the number of nonzero coefficients at the current λ, which is the effective degrees of freedom (df) for the lasso.In the plot from left to right, we observe that at first (Figure 15), the lasso models contain many predictors with high magnitudes of coefficient estimates.
With increasing lambda, the coefficient estimates approximate towards zero.In simple terms, the curves that have high magnitudes of coefficient estimates are stronger predictors of the models, compared to the ones which are close to zero coefficient values.Next ranking of lasso coefficient values of these high-magnitude features was performed using varImp() function in caret R package, which creates the standardized scale of final coefficients of the fit and signifies feature importance.
We have provided the code below for your reference on how we computed the feature score.
Figure 15:The coefficients from the Lasso fit represent the contributions of the 20 genes expression in the model.The right plot shows lasso regression coefficient values in which each curve corresponds to a variable.It shows the path of its coefficient against the Log Lambda of the whole coefficient vector as λ varies.The axis above indicates the number of nonzero coefficients at the current λ, which is the effective degrees of freedom (df) for the lasso.Each variable with higher magnitude of lasso coefficient were subjected for scaling using caret R package assessing and ranking feature importance.
References: Thank you for the submission of your revised manuscript to EMBO Molecular Medicine.We have now received the feedback from the referee who was consulted on your manuscript.As you will see below, he/she is supportive of publication, and I will therefore be able to accept your manuscript once the following editorial points will be addressed: Problem Triple-Negative Breast Cancer (TNBC) is the most aggressive type of breast cancer and is hard to treat.It spreads fast, doesn't respond well to chemotherapy, and often leads to poor outcomes in patients.Despite advances in the field, the molecular basis of these aggressive behaviors remains poorly understood.

Results
In this study, we used advanced techniques that allow a closer look at the TNBC tumor cells and their genes at the single-cell and spatial resolution.This analysis identified specific groups of cells in the tumor that exhibit resistance to chemotherapy.Furthermore, these cells express certain genes that are highly active and predictive of future response to chemotherapy.Interestingly, high levels of ITGB1 improves cell communication, and ACTN1 expression gives cells a survival advantage, fostering resistance to chemotherapy.Furthermore, we identified existing drugs that may be repurposed against chemoresistant tumors.Impact Our findings provide an explanation on why certain TNBC tumors are resistant to chemotherapy and proposes a biomarker for predicting patient's response to chemotherapy.This work opens avenues for precision medicine, providing stratification biomarker and alternative therapies for better managing TNBC patients resistant to traditional chemotherapy.7/ I introduced minor modifications in your synopsis text, please let me know if you agree or amend as you see fit: Chemotherapy resistance is a key challenge in Triple-Negative Breast Cancer (TNBC).Combining single-cell, spatial and bulk transcriptome analysis with machine learning, we uncovered mechanisms of TNBC chemoresistance that provide biomarkers for chemotherapy response and novel avenues for therapy.
• Chemoresistance-associated basal-epithelial cells reside in close vicinity to stromal compartments within TNBC tumors and engage in enhanced intercellular communication.
• These subpopulations are defined by distinct signature genes that provide the best-in-class predictive biomarker of chemotherapy response.
• Drug repurposing analysis identified existing FDA-approved drugs that may benefit chemoresistant patients.8/ Please let us know whether you agree with the publication of the Review Process File, and as here, or if you want to remove any figure.As mentioned previously, the RPF would only include reviewer comments and information related to the peer review at EMBO Press.Any information prior to transfer would not be part of this file.Relevant question to be asked (from medical perspective), well chosen regression model, extensive benchmarking.To prove the the "real" medical impact of the proposed predictor, future work is still needed.
Referee #1 (Remarks for Author): I thank the Authors for their extensive response and clarifications.

7th Feb 2024 2nd Authors' Response to Reviewers
The authors addressed the remaining editorial issues.
14th Feb 2024 2nd Revision -Editorial Decision 14th Feb 2024 Dear Vijay, Thank you for sending the revised files.I am pleased to inform you that your manuscript is accepted for publication and is now being sent to our publisher to be included in the next available issue of EMBO Molecular Medicine.
We note that there is an additional panel in the new Figure EV3, please carefully check the file and send us the corrected figure as soon as possible.Your manuscript will be processed for publication by EMBO Press.It will be copy edited and you will receive page proofs prior to publication.Please note that you will be contacted by Springer Nature Author Services to complete licensing and payment information.
You may qualify for financial assistance for your publication charges -either via a Springer Nature fully open access agreement or an EMBO initiative.Check your eligibility: https://www.embopress.org/page/journal/17574684/authorguide#chargesguideShould you be planning a Press Release on your article, please get in contact with embo_production@springernature.com as early as possible in order to coordinate publication and release dates.
If you have any questions, please do not hesitate to contact the Editorial Office.Thank you for your contribution to EMBO Molecular Medicine.

Lise
Lise Roth, Ph.D Senior Editor EMBO Molecular Medicine >>> Please note that it is EMBO Molecular Medicine policy for the transcript of the editorial process (containing referee reports and your response letter) to be published as an online supplement to each paper.If you do NOT want this, you will need to inform the Editorial Office via email immediately.More information is available here: https://www.embopress.org/transparentprocess#Review_Process

EMBO Press Author Checklist USEFUL LINKS FOR COMPLETING THIS FORM
The EMBO Journal -Author Guidelines EMBO Reports -Author Guidelines Molecular Systems Biology -Author Guidelines EMBO Molecular Medicine -Author Guidelines Please note that a copy of this checklist will be published alongside your article.

Abridged guidelines for figures 1. Data
The data shown in figures should satisfy the following conditions: New materials and reagents need to be available; do any restrictions apply?Not Applicable

Antibodies
Information included in the manuscript?
In which section is the information available?
(Reagents and Tools Cell lines: Provide species information, strain.Provide accession number in repository OR supplier name, catalog number, clone number, and/OR RRID.

Yes
The details of the cell lines used in the study are included in "Method" section as well as figure legends.
Primary cultures: Provide species, strain, sex of origin, genetic modification status.

Not Applicable
Report if the cell lines were recently authenticated (e.g., by STR profiling) and tested for mycoplasma contamination.

Yes
All the cell lines used in the study were tested for mycoplasma and found to be negative before the study.

Experimental animals Information included in the manuscript?
In which section is the information available?
(Reagents and Tools

Reporting
Adherence to community standards Information included in the manuscript?
In which section is the information available?
(Reagents and Tools Have primary datasets been deposited according to the journal's guidelines (see 'Data Deposition' section) and the respective accession numbers provided in the Data Availability Section?

Data availability section
Were human clinical and genomic datasets deposited in a public accesscontrolled repository in accordance to ethical obligations to the patients and to the applicable consent agreement?

Not Applicable
Are computational models that are central and integral to a study available without restrictions in a machine-readable form?Were the relevant accession numbers or links provided?

Not Applicable
If publicly available data were reused, provide the respective data citations in the reference list.Yes Provided data citation as per the EMBO guidelines The MDAR framework recommends adoption of discipline-specific guidelines, established and endorsed through community initiatives.Journals have their own policy about requiring specific guidelines and recommendations to complement MDAR.
Comments on Novelty/Model System for Author): Minor comments: 1. Fig 1A: consistency in annotation "3,985 cells" and "6862 cells" 2. Suppl Fig1B and C, typos in the titles of the plots 3. Typo page 20, top word: "...genes were further analyses for..."should be analyzedRESPONSES TO REVIEWERS Reviewer 1 (Remarks to the Author):We thank the reviewer for her/his exciting remarks, "A suitable regression model chosen to select genes for chemotherapy resistance prediction with potential clinical implications.This is the most interesting and original outcome of the paper."The reviewer further comments "Inthe presented manuscript, the Authors re-analyze published datasets of single-cell RNA sequencing of triple-negative breast cancer (TNBC) patients and find that basal epithelial cells exhibit higher levels of chemoresistance markers.Thus, they derive a signature of 101 shared marker genes (at least across 2 out of 3 used datasets).It is then shown that this signature is enriched also in comparison of Chemoresistant vs Chemosensitive patients' single-cell data both in pre-and post-treatment stage suggesting that this signature might contain predictive potential on chemoresistance.Indeed, the Authors develop predictive model based on regression with elastic net regularization and narrow down the signature to 20 genes which their model uses to predict the chemotherapy response.The regression model is than benchmark against other published predictors and it is shown is has superior performance."Thereviewer had a couple of remarks and questions to help clarify the flow of the paper, which we have implemented as follows:Comment 1) 1.
Figure 1B represents a single scRNA dataset

Figure
Figure 1C (source study): Bar plot depicting the distribution of the 1112 cells assigned to specific cell types, by patient.Figure 2A (source study): t-SNE plot of all 1112 classified cells, demonstrating separation of nonepithelial cells by cell type.
2a-f of the source study).Importantly however, the authors here did not further classify these epithelial cells into luminal and basal malignant cells.

Figure 2 (
Figure 2 (source study):Clustering, genomic CNVs, and correlation maps classify most epithelial cells as malignant.a t-SNE plot of all 1112 classified cells, demonstrating separation of non-epithelial cells by cell type.b t-SNE plot of the 244 non-epithelial cells, demonstrating separation by cell type, and no distinguishable patient effect.c t-SNE plot of the 868 epithelial cells, showing mixed separation by patient, and substantial clustering of cells from different patients, suggesting pronounced intra-tumor heterogeneity.d Inferred CNVs from the single-cell gene expression data.Columns represent individual cells, and rows represent a selected set of genes, arranged according to their genomic coordinates (chromosome number indicated at left).A set of 240 normal mammary epithelial cells is shown on the left for comparison, and epithelial cells from all TNBC cases are shown, clustered separately for each patient.Amplifications (red) or deletions (blue) are inferred by computing, for each gene, a 100-gene moving average expression score, centered at the gene of interest.Prominent subclones defined by shared CNVs in tumors 39 and 81 are indicated by brackets on the top ("clonal").e WES data for four of the six TNBC cases demonstrates high concordance with the CNV calls inferred from the transcriptomes of single cells (d).Genomic coordinates are arranged as in d from top to bottom, and mean copy number for each region ("CNV mean") is indicated on a continuous scale, with red representing gain and blue representing loss.Accordingly, scanning from left (d) to right (e) allows for a comparison of inferred CNVs (d) and actual CNVs (e) for the same regions.f Correlation map among the expression profiles of the normal epithelial cells and the TNBC epithelial cells, depicted in the same order from left to right as d.Normal cells, as well as malignant clonal subpopulations defined by sharedCNVs for tumors 39 and 81 (indicated as "clonal" at top), are correlated.The remaining non-clonal epithelial populations in all tumors show relatively poor correlation, supporting their identity as malignant cells.
, left heatmap), which confirms the identify of annotated clusters in our analysis (Fig. 3, right violin plots).Next, we also plotted other markers of malignancy known in the literature (Hu et al, 2023) to stratify malignant cells within the basal and luminal epithelial cell types.Here we have retrieved cancer cell markers of basal and luminal epithelial types of the breast from cellMarker (Hu et al., 2023) database,

Figure 3 :
Figure 3: Left plots are from source study; Figure 1B, showing expression of cell type markers across single-cells of primary TNBC tumors.The right plot shows the same markers plotted on our clustered cells which aligns with the source study, including markers specific to basal epithelial clusters.

Figure 4 :
Figure 4: The violin plots show expression of malignant cell markers of Luminal and basal breast cancer type.The cancer cell marker of basal and luminal epithelial type was retrieved from CellMarker database and plotted on our primary TNBC dataset.

Figure 5 :
Figure 5: The infercnv analysis classified majority of TNBC cells as malignant cells.The upper heatmap plot shows copy number alternation profile in healthy mammary epithelial cells.The lower heatmap plot showing CNV profile of TNBC epithelial cells.We have used total 240 healthy mammary epithelial cells to compute copy number alteration in TNBC epithelial, including basal epithelial cells.

Figure 6 :
Figure 6: Copy number score computed from inferCNV shows clear seperation of TNBC epithelial vs normal epithelial cells, indicating TNBC cells as malignant cells.A The histogram plot shows binomial distribution of infercnv score of normal epithelial vs TNBC epithelial cells.The infercnv scores less than 0.2 defined normal epithelial cells and score greater than 0.2 defined TNBC epithelial cell types.B boxplot depicting distribution of infercnv scores of cells having significant difference between the normal epithelial vs TNBC epithelial cells.C and D shows similar infercnv score profile across each cell types.The red dotted lines shows infercnv scores threshold separating normal epithelial from TNBC epithelial cells.

Figure 7 :
Figure 7: The umap plot is of primary TNBC samples shows, there are no batch effects exists, as cells clustered based on distinct celltypes and not by individual patient samples.

Figure 8 :
Figure 8: The upper UMAP plot shows existence of possible batch effect in resistant and sensitive datasets.The batch effects were regressed out using CCA and samples were integrated in Seurat.The bottom UMAP plots clearly shows cells are clustered based on the cell type and hence shows removal of possible batch effects from the datasets.

Figure 9 :
Figure9: Integration of 3 scRNA-seq data analysis shows basal epithelial signature intact, like independent analysis approach.A) UMAP plot shows clustering of cells from 3 independent datasets and possible batch effect, as samples clustered based on the dataset.B) The UMAP plot shows good integration and batch effect removal after applying canonical correlation analysis (CCA) method on the dataset.The cells are clustered based on the cell types and shared from independent datasets, hence indicate batch effect removal from the datasets.C) shows expression of aggressive gene signatures from earlier studies also used in our figure1C of the revised manuscript.the mean expression of 49 metastasis and 143 chemoresistance signature genes were plotted on the integrated datasets and it shows higher enrichment within basal cluster compared to other cell types.D) The UMAP shows the intactness of our signature genes (101 genes from Figure1G) as well as our predictive gene signature of pCR and RD within basal epithelial cluster.We could see, in line with our independent approach, these signatures are enriched only in basal epithelial cells in the integrated datasets.

Figure 10 :
Figure 10: The spatial analysis of TNBC tissue section shows spatial arrangement of basal epithelial cells in close vicinity to stromal compartment.A The left plot shows tissue section of aggressive TNBC tumor, the right right plot shows cell type deconvolution of TNBC cell types within histological section.B the left plot shows mean expression of our signature within the tissue section and right plot boxplot is showing expression levels of our gene signature across different cell types of the same spatial dataset.

Figure 11 :
Figure 11: Cell type annotation of chemoresistant and chemosesitive tumors shows enrichment of our signature genes in post treated chemoresistant tumors.A-B.Left and right UMAP plot shows cell type annotation of chemosensitive and chemoresistant cells.C. left UMAP plot shows expression of our signature genes in chem resistant dataset.Right violin plot shows expression of our signature across different cell types of chemoresistant tumor dataset.

Authors' response :
We apologize for the typo in the legend for Fig 3E.Indeed the expression of EMT markers is shown as a Z-Score and we have now corrected it accordingly in the legend.Furthermore, as suggested by

Comment 7 )
Fig 4B: The selected subtypes are significantly upregulated compared to Low_CIN?The statistics in not shown.Author's response: In the figure 4B, we investigated the expression of our signature genes among six previously defined CNA subtypes of TNBC (Jiang et al, 2019).These CNA subtype represents CNA subtype 1, frequent 9p23 amplification (Chr9p23 amp); CNA subtype 2, frequent 12p13 amplification (Chr12p13 amp); CNA subtype 3, frequent Chr13q34 amplifications (Chr13q34 amp); amp); CNA subtype 5, frequent Chr8p21 loss (Chr8p21 del); and CNA subtype 6, somatic CNA lacking a CN cluster but with low chromosomal instability (CIN) (low CIN).Indeesd, our signature genes were not significantly upregulated within any of these groups, but showed a trend of elevation in the tumors of frequently amplified group subtypes compared to Low CIN groups (Figure 13 left boxplot).Consequently, we did not claim in the manuscript that this difference was 'significant'.

Figure 12 :
Figure 12: Heatmap shows expression profiling of hallmark epithelial and mesenchymal genes across TCGA TNBC tumors.We have used expression of 4 epithelial and 6 mesenchymal markers to classify TNBC tumors into EMT high (Mesenchymal), Hybrid and EMT-Low (Epithelial) like tumors.

Figure 13 :
Figure 13: The left boxplot shows expression of our signature within high copy number vs low CIN groups.The right plot shows expression of our signature genes in mutation type categories in TNBC.

Figure 14 :
Figure 14: The right plot shows lasso coefficient values based ranking of features for predictive model building.The left ROC plot shows performance evaluation of our predictive classifier upon removal of genes ranked 15 to 20.

1/
Manuscript text: -Please accept previous changes, and only keep in track changes mode any new modification.-Materials and Methods: o Cell culture: please indicate the origin of the cells, and whether they were authenticated and tested for mycoplasma contamination.o Primers sequences should be in the main manuscript.o Please add a statistics section with mention of blinding, sample size, randomization, etc. (please refer to the authors checklist).-Data Availability section: please add an URL link to the dataset.-Please correct the order of the following sections to: Disclosure and competing interests statement, References, Figure legends, Tables and their legends, Expanded View Figure legends.-Please remove "Supplementary information".-Please remove the legend for Table EV1 from the manuscript file and add it to the Excel file.-Figure legends: For EV figures, the section heading should be "Expanded View Figure Legends", and the figures should be named "Figure EV1", etc. -Data citation: please incorporate the Data citations references to the rest of the references (in alphabetical order).2/ Figures: -Please provide exact p values for Figure EV3 panel D. -Figure 8I contains error bars based on n=2.Please use scatter blots showing the individual datapoints in these cases.The use of statistical tests needs to be justified.4/ Thank you for providing Source Data.Please upload them as one file per figure.5/ Checklist: Please complete/correct the following sections: -Primers sequences -Cell authentication and mycoplasma -Experimental study design and statistics.6/ The Paper Explained: I introduced minor modifications in your text, please let me know if you agree or amend as you see fit: To submit your manuscript, please follow this link: https://embomolmed.msubmit.net/cgi-bin/main.plex***** Reviewer's comments ***** Referee #1 (Comments on Novelty/Model System for Author):

In which section is the information available?
definitions of statistical methods and measures: (Reagents and Tools Table, Materials and Methods, Figures, Data Availability Section)

In which section is the information available?
Table, Materials and Methods, Figures, Data Availability Section) (Reagents and Tools Table, Materials and Methods, Figures, Data Availability Section)

Cell materials Information included in the manuscript? In which section is the information available?
(Reagents and Tools Table, Materials and Methods, Figures, Data Availability Section)

In which section is the information available?
Table, Materials and Methods, Figures, Data Availability Section) (Reagents and Tools Table, Materials and Methods, Figures, Data Availability Section) If collected and within the bounds of privacy constraints report on age, sex and gender or ethnicity for all study participants.

In which section is the information available?
(Reagents and ToolsTable, Materials and Methods, Figures, Data Availability Section)Design-common tests, such as t-test (please specify whether paired vs. unpaired), simple χ2 tests, Wilcoxon and Mann-Whitney tests, can be unambiguously identified by name only, but more complex techniques should be described in the methods section;

Please complete ALL of the questions below. Select "Not Applicable" only when the requested information is not relevant for your study. if
n<5, the individual data points from each experiment should be plotted.Any statistical test employed should be justified.Source Data should be included to report the data underlying figures according to the guidelines set out in the authorship guidelines on Data Each figure caption should contain the following information, for each panel where they are relevant: a specification of the experimental system investigated (eg cell line, species name).theassay(s)and method(s) used to carry out the reported observations and measurements.anexplicitmention of the biological and chemical entity(ies) that are being measured.anexplicitmention of the biological and chemical entity(ies) that are altered/varied/perturbed in a controlled manner.ideally,figurepanels should include only measurements that are directly comparable to each other and obtained with the same assay.plotsinclude clearly labeled error bars for independent experiments and sample sizes.Unless justified, error bars should not be shown for technical the exact sample size (n) for each experimental group/condition, given as a number, not a range; a description of the sample collection allowing the reader to understand whether the samples represent technical or biological replicates (including how many animals, litters, cultures, etc.).a statement of how many times the experiment shown was independently replicated in the laboratory.This checklist is adapted from Materials Design Analysis Reporting (MDAR) Checklist for Authors.MDAR establishes a minimum set of requirements in transparent reporting in the life sciences (see Statement of Task: 10.31222/osf.io/9sm4x).Please follow the journal's guidelines in preparing your the data were obtained and processed according to the field's best practice and are presented to reflect the results of the experiments in an accurate and unbiased manner.

Checklist for Life Science Articles (updated January Study protocol Information included in the manuscript? In which section is the information available?
(Reagents and Tools Table, Materials and Methods, Figures, Data Availability Section)If study protocol has been pre-registered, provide DOI in the manuscript.For clinical trials, provide the trial registration number OR cite DOI.

In which section is the information available?
(Reagents and Tools Table, Materials and Methods, Figures, Data Availability Section)

Sample definition and in-laboratory replication Information included in the manuscript? In which section is the information available?
(Reagents and Tools Table, Materials and Methods, Figures, Data Availability Section)

In which section is the information available?
Include a statement confirming that informed consent was obtained from all subjects and that the experiments conformed to the principles set out in the WMA Declaration of Helsinki and the Department of Health and Human Services Belmont Report.State details of authority granting ethics approval (IRB or equivalent committee(s), provide reference number for approval.Include a statement of compliance with ethical regulations.
(Reagents and Tools Table, Materials and Methods, Figures, Data Availability Section)Studies involving human participants: State details of authority granting ethics approval (IRB or equivalent committee(s), provide reference number for approval.Not ApplicableStudies involving human participants:Not ApplicableStudies involving human participants: For publication of patient photos, include a statement confirming that consent to publish was obtained.Not Applicable Studies involving experimental animals:

Use Research of Concern (DURC) Information included in the manuscript? In which section is the information available?
(Reagents and ToolsTable, Materials and Methods, Figures, Data Availability Section) Could your study fall under dual use research restrictions?Please check biosecurity documents and list of select agents and toxins (CDC): https://www.selectagents.gov/sat/list.htmNot Applicable If you used a select agent, is the security level of the lab appropriate and reported in the manuscript?Not Applicable If a study is subject to dual use research of concern regulations, is the name of the authority

granting approval and reference number for
the regulatory approval provided in the manuscript?

and III randomized controlled trials
Table, Materials and Methods, Figures, Data Availability Section) State if relevant guidelines or checklists (e.g., ICMJE, MIBBI, ARRIVE, PRISMA) have been followed or provided.Not Applicable For tumor marker prognostic studies, we recommend that you follow the REMARK reporting guidelines (see link list at top right).See author guidelines, under 'Reporting Guidelines'.Please confirm you have followed these guidelines., please refer to the CONSORT flow diagram (see link list at top right) and submit the CONSORT checklist (see link list at top right) with your submission.See author guidelines, under 'Reporting Guidelines'.Please confirm you have submitted this list.Reagents and Tools Table, Materials and Methods, Figures, Data Availability Section) (