APP下载

Screening of Postpartum Depression Diagnostic Markers Based on Immune-Related Genes and Immune Infiltration

2022-03-23YiDanSunXiaoJiangLiPeiYingYangYingJieJia

Psychosomatic Medicine Resesrch 2022年1期

Yi-Dan Sun,Xiao-Jiang Li,Pei-Ying Yang,Ying-Jie Jia*

1First Teaching Hospital of Tianjin University of Traditional Chinese Medicine,Tianjin,China.

Abstract

Keywords: Postpartum Depression, Diagnostic markers, Immune-related genes, Immune infiltration

Background

Postpartum depression (PPD) is a mild to severe non-psychotic depressive episode, and it is one of the main factors leading to pregnancy-related morbidity and mortality [1]. Maternal depression(before or after childbirth) is associated with negative health-related behaviors and adverse consequences, including psychological and developmental disorders in infants, children, and adolescents. More and more evidences show that congenital and acquired have a major influence on the pathogenesis of neuropsychiatric diseases or psychopathology [2]. PPD is a mental disorder that has not been fully diagnosed and treated [3]. It has a negative impact on the health of mothers and newborns. Compared with women without polycystic ovary syndrome, women with polycystic ovary syndrome are more prone to a variety of pregnancy complications, including hypertension, gestational diabetes, and preterm birth [2]. However,there is currently limited research on whether polycystic ovary syndrome is related to anxiety and depression during pregnancy, and whether this increases the risk of postpartum depression in women.Genetically speaking, postpartum women can be explained by the lack or presence of certain genetic variants that lead to increased risk. We used PPD expression profiles from Gene Expression Synthesis (GEO)data for the messenger RNA (mRNA) expression profile matrix. The differentially expressed genes of the GSE10558 data set were selected and analyzed. Perform gene ontology (GO) enrichment and gene set variation analysis (GSVA) for annotation, visualization, and integrated discovery. And further predicted the drug target of the hub gene and the target of small molecule compounds, and constructed a network.Although PPD is the main cause of disability worldwide, its molecular mechanism is still unclear. As we all know, neurotransmitters regulate immune response and anti-immune response, and also affect the immune microenvironment through this indirect mechanism.Different subsets of immune cells migrate into the microenvironment and modulate the immune response in a spatiotemporal manner. It is speculated that the immune cells in the microenvironment may play some important roles in the occurrence and development of PPD.Recent evidence suggests that immune activation is counteracted by microenvironmental elements that promote immune tolerance and prevent uncontrolled inflammation. In order to explore whether the immune microenvironment plays a role in the occurrence and development of PPD, the study further analyzed the immune infiltration of PPD samples. The CIBERSORT and ESTIMATE algorithms were used to analyze the immune infiltration patterns including ovarian samples to construct an immune cell infiltration(ICI) score. Then we screened the differentially expressed genes(DEGs) clustered with 3 groups of immune subtypes, and constructed a protein-protein interaction (PPI) and mRNA-miRNA-TF molecular interaction network. Based on the intersection of the phenotypic gene set, the pivot gene was identified. Finally, evaluate the expression differences of Hub genes between the data set groups, and generate receiver operating characteristic (ROC) curves to verify the diagnostic value of DEGs. Finally, genes with high area under the curve (AUC)values are validated.

Materials and methods

Data download and data preprocessing

The GEO query package of R software (version 3.6.5,http://r-project.org/) [4] was used to download samples from the GEO (https://www.ncbi.nlm.nih.gov/geo/) database. The reliable source mRNA expression profile data set GSE10558 [5], the platform is based on GPL341 [RAE230A] Affymetrix Rat Expression 230A Array, the samples in the data set are derived from mice, including 8 testis samples and 6 ovarian samples, all of which are included in this study. The normalize Between Arrays algorithm is used to perform background correction and data normalization processing on the two data sets, and the gene expression matrices of the two data sets are obtained respectively and displayed in box plots.

Screening of DEGs

The limma package [6] was used to screen the DEGs between the 8 testis samples and 6 ovarian sample groups in the GSE10558 data set,and use the ggplot2 package to draw the heat map and volcano map of DEGs to show the differential expression of DEGs. In addition, we obtained immune-related genes from the ImmPort database(https://www.immport.org/) [7]. The database aggregates raw data from clinical trials, mechanism studies, and cellular and molecular measurements. In order to facilitate data transmission, templates for data representation and standard operating procedures have also been created, and these data have been standardized for data processing.And extracted these immune-related gene expression data from the mRNA expression matrix of the GSE10558 data set, and then screened the immune-related differential genes between groups again. All the above DEGs satisfyPvalue <0.05 and |log2FC|>0.5. A total of 222 differential genes were screened out, of which 18 were differential genes with adj(P<0.05).

GSEA and GSVA enrichment analysis

We performed GSEA enrichment analysis between testis samples and ovarian sample groups in the GSE10558 data set (the background set uses c2.cp.v7.2.symbols.gmt). Gene set variation analysis (GSVA) is a non-parametric and unsupervised gene set enrichment method that can estimate the scores of certain pathways or signatures based on transcriptome data. GSVA is a non-parametric, unsupervised method for estimating the variation of gene set enrichment from samples of expression data set. The algorithm converts the data from a sample-by-sample matrix gene set to a sample-by-sample matrix gene set, thereby allowing evaluation of the pathway enrichment of each sample. This enables the application of standard analysis methods in a pathway-centric manner, such as functional enrichment, survival analysis, clustering, CNV pathway analysis, or cross-organization pathway analysis. At the same time, we performed GSVA enrichment analysis between groups[8](the background set uses the pathway set,includingc2.cp.kegg.v7.0.symbols.gmt;c2.cp.reactome.v7.0.symbols.gmt; c2.cp. biocarta.v7.0.symbols.gmt;c2.cp. pid.v7.0.symbols.gmt, as well as msigdb.v7.0.symbols.gmt and immune-related gene set c7.all.v7.0 .symbols.gmt).

Evaluation and analysis of immune cell infiltration

CIBERSORT is based on the principle of linear support vector regression to deconvolve the transcriptome expression matrix, so as to estimate the composition and abundance of immune cells in the mixed cells [9]. We upload the gene expression matrix data to CIBERSORT,filter out samples withP<0.05, and get the immune cell infiltration matrix.Pheatmappackage(https://CRAN.R-project.org/package=pheatmap) is used to draw heat maps to show the distribution of 22 immune cell infiltration in each sample; corrplot package draws related heat maps for visualization. The correlation of 22 kinds of immune cell infiltration.The ggplot2 package draws a violinplot to visualize the differences in the infiltration of 22 immune cells.

Construction of immune characteristic subtypes

We performed single sample gene set enrichment analysis (ssGSEA)[10] on the GSE10558 data set to evaluate the degree of immune cell infiltration. ssGSEA is an extension of the GSEA method, which is mainly designed for a single sample that cannot be used for GSEA.GSVA package can realize ssGSEA analysis. Next, use the hclust function to perform hierarchical clustering of the samples, divide them into three clusters of high,medium, and low immune scores,and display them in a dendrogram.And use the limma package to compare the three clusters pairwise, analyze the DEGs between the groups, and obtain a total of 106 DEGs.

GO enrichment analysis

GO function annotation analysis is a common method for large-scale gene enrichment research, including biological process (BP),molecular function (MF)and cellular component (CC) [11-13]. For the above analysis, we used the clusterProfiler package[14] for the above 106 DEGs to perform GO, p.adjust <0.05&q value <0.05 is considered statistically different. The gene set enrichment analysis(GSEA) analysis was also performed on the gene expression matrix through the clusterProfiler package, and "c2.cp.v7.2.symbols.gmt" was selected as the reference gene set, andP<0.05 was considered to be significantly enriched.

Construction of molecular interaction network

We constructed a molecular interaction network for 106 DEGs between immune clustering groups. First, perform PPI network analysis through STRING database [15], and use Cytoscape [2] for visualization. Here we use Cistrome DB (http://cistrome.org/db/#/)[7] to collect a total of 30,451 human and 26013 mouse transcription factors, histone modifications and chromatin accessibility samples,which can be said to be the current, the most comprehensive study of ChIP-seq and DNase-seq databases. We can view transcription factor-regulated genes in Cistrome DB, detailed data annotations,analysis results and detailed information of a single data set (data QC status, motif analysis results, potential target gene prediction), and also in the genome browser View the distribution of the data and download the analysis result files, etc. And we predicted transcription factors for 106 DEGs, constructed a transcription factor regulatory network. For DEGs, we use ENCODE [16], TarBase [17], miRecords[18], miRTarBase [11] databases to predict miRNAs interacting with target genes, and protein-drug network predictions. Protein and drug target information is collected from the DrugBank database (version 5.0) (https://go.drugbank.com/) [12].

Analysis of phenotypic differences

Genecards database is the most comprehensive and authoritative database of human gene annotation information. It uses genes as the center to realize automatic mining and integration of data and information. It provides more than 73,000 human gene entries including protein coding, pseudogenes, RNA, etc. and includes genetic trajectory, clustering and unclassified information. We collected autophagy (search term: autophagy, a total of 5,910 related genes),nervous system (emotional stability search term: neuroticism, a total of 1089 related genes), endoplasmic reticulum stress (search term:endoplasmic reticulum stress, a total of 6993 related genes) and mitochondria (search term: mitochondrial stress, a total of 8827 related genes) stress-related gene set, and GSE10558 Intersection of 106 DEGs among the high, medium, and low score groups of the ICI immune clustering is shown PPI network. We took the intersection of 18 significantly DEGs and phenotypic gene sets. Furthermore, the phenotypic-related differential genes obtained after the intersection were taken again and the four common genes were finally obtained,which were regarded as Hub genes and retrogradely analyzed for further analysis.

Statistical analysis

All data processing and analysis are done through Excel (Microsoft)and R software (version 4.0.2). For the comparison of two groups of continuous variables, the statistical significance of normally distributed variables is estimated by independent Student's t test, and the differences between non-normally distributed variables are analyzed by Mann-WhitneyU test (ie Wilcoxon rank sum test).Chi-square test or Fisher's exact test is used to compare and analyze the statistical significance between two groups of categorical variables. The Kruskal-Wallis test was used to compare the two groups and the Wilcoxon test was used to compare the two groups. Use R's pROC package to draw the ROC curve, and calculate the AUC to evaluate the accuracy of the risk score to estimate the prognosis. All statisticalPvalues are two-sided, andP<0.05 is considered statistically significant. Two-tailedP<0.05 is considered statistically significant.

Results

Data download and data preprocessing

We perform background correction on the data set, normalize the data, download the reliable mRNA expression profile data set GSE10558, the platform is based on GPL341 [RAE230A] Affymetrix Rat Expression 230A Array, the samples in the data set are from mice,including testes 8 samples and 6 ovarian samples were included in this study. Perform background correction and data normalization processing on the two data sets through the normalize Between Arrays algorithm to obtain the gene expression matrices of the two data sets respectively, and display the gene expression before and after data normalization in a box plot (Figure 1).

Evaluation and analysis of immune cell infiltration

The CIBERSORT algorithm is based on the principle of linear support vector regression to express the matrix of the transcriptome, so as to estimate the composition and abundance of immune cells in the mixed cells. We use the CIBERSORT algorithm to deconvolve the GSE10558 gene expression matrix to obtain the immune cell infiltration matrix.Calculate the proportion of immune cells through the par function and draw a histogram (Figure 2A). The corrplot package draws correlation heat maps to visualize the correlation of 22 kinds of immune cell infiltration (Figure 2B), all of which are positive correlations. The ggplot2 package draws a violin chart to visualize the differences in the infiltration of 22 immune cells (Figure 2C). The results show that there are significant differences in B cells between the groups. And use the pheatmap package to draw a heat map to show the distribution of 22 kinds of immune cells infiltrated in each sample(Figure 2D).

Construction of immune characteristic subtypes

We performed single sample gene set enrichment analysis(ssGSEA) on the GSE10558 data set to evaluate the degree of immune cell infiltration. Use the hclust function to perform hierarchical clustering of samples, divide them into three clusters of high, medium, and low immune scores, and display them in a dendrogram (Figure 3B) and a PCA cluster diagram (Figure 3C). And use the limma package to compare the three clusters pairwise, analyze the DEGs between the groups, and obtain a total of 106 DEGs. We drew a heat map of the differences in immune cells and immune phenotypes between the high, medium, and low scores of ICI immune clustering groups of GSE10558 (Figure 3A).

Construction of molecular interaction network and Hub genes screening

The 18 DEGs between the 8 testicular samples and 6 ovarian sample groups in the GSE10558 data set were screened through the limma package, and it was found that these genes were all included in the ssGSEA analysis of the immune gene set prediction score clustering.After clustering, the three clusters were compared and analyzed for each group. Within the 106 DEGs obtained. We use ENCODE,TarBase,miRecords, and miRTarBase databases to predict miRNAs and transcription factors that interact with target genes (Figure 4A), and protein-drug network predictions (Figure 4B). Subsequently, we analyzed the correlation between APOA1, PLN, PRKCZ, TRPV2 and GSE10558 immune clustering, and drawn a scatter plot. And further analyzed the expression differences of APOA1, PLN, PRKCZ, TRPV2 between the GSE10558 testis sample and the ovarian sample group,and displayed it as a box diagram.

We then took the intersection of 18 significantly differently expressed genes and phenotypic gene sets. Furthermore, the phenotypic-related differential genes obtained after the intersection were taken again and the four common genes were finally obtained,which were regarded as Hub genes and retrogradely analyzed for further analysis(Figure 5).

GO enrichment analysis

We performed GO on the above 106 DEGs using the clusterProfiler package, and p.adjust <0.05&q value <0.05 is considered to be statistically different. Draw bar graphs, circle graphs, bubble graphs and chord graphs (Figure 6). And listed the corresponding enrichment result list (Table1).

Figure 1:Data set standardization.(A) GSE10558 data set mRNA expression profile matrix box plot before calibration; (B)GSE10558 data set mRNA expression profile matrix box plot after calibration.

Figure 2:Immune infiltration analysis. (A) CIBERSORT analyzed the accumulation histogram of the contents of 22 immune cells in each sample of the GSE10558 data set; (B) the heat map of the co-expression of 22 immune cells in the GSE10558 data set; (C) the testis sample and ovarian sample group in each sample of the GSE10558 data set Violin image of the expression difference of 22 kinds of immune cells; (D) Heat map of the expression difference of 22 kinds of immune cells between the testis sample and the ovarian sample group in each sample of the GSE10558 data set.

Figure 3:Immunity model construction. (A) GSE10558 ICI immune clustering high, middle and low scores of immune cells and immune phenotype related difference heat map;(B)GSE10558 immune clustering dendrogram;(C)GSE10558 ICI immune clustering high Visualization of PCA between middle and low score groups.

Figure 4:Molecular network construction. (A) 106 DEGs among the high,medium, and low score groups of the ICI immune clustering were used as target genes to predict miRNA and TF in the ENCODE,TarBase,miRecords,and miRTarBase databases to construct an mRNA-miRNA-TF network; (B) Protein-drug target prediction of Hub genes (only 2 DEGs The translated protein has a corresponding drug target).

Figure 5:Analysis of phenotype-related differences.(A) Venn diagram of the intersection of autophagy-related genes and 18 DEGs; (B)Venn diagram of the intersection of nervous system (emotional stability)-related genes and 18 DEGs;(C) Endoplasmic reticulum stress-related genes and 18 Venn diagram of the intersection of DEGs; (D) Venn diagram of the intersection of mitochondrial stress-related genes and 18 DEGs.

Figure 6:GO enrichment analysis.(A)GO enrichment analysis bar graph,the length of the column represents the amount of gene enrichment,the color represents significance, and the significance gradually increases from yellow to blue; (B) GO enrichment analysis analysis circle diagram; (C) GO enrichment analysis bubble chart, the size of the bubble represents the number of gene enrichment, the color represents the significance, and the significance gradually increases from yellow to blue; (D) GO enrichment analysis string chart.

Table 1 Functional and pathway enrichment analysis of the overlap DEGs

Figure 7:GSEA enrichment analysis.(A) The set of up-regulated genes between GSE10558 testis sample and ovarian sample group, the selected reference gene set is c2.cp.v7.0.symbols.gmt;(B)The set of down-regulated genes between GSE10558 testis sample and ovarian sample group.

GSEA and GSVA enrichment analysis

We performed GSEA enrichment analysis between testis samples and ovarian sample groups in the GSE10558 data set (the background set uses c2.cp.v7.2.symbols.gmt) (Figure 7A-B).GSVA is a non-parametric and unsupervised gene set enrichment method that can estimate the scores of certain pathways or signatures based on transcriptome data.At the same time, we performed GSVA enrichment analysis between groups (the background set uses the pathway set, including c2.cp.biocarta.v7.0.symbols.gmt (Figure 8A); c2.cp.kegg.v7.0.symbols.gmt(Figure8B);c2.cp.pid.v7.0.symbols.gmt(Figure8C);c2.cp.reactome.v7.0.symbols.gmt (Figure 8D).

Diagnostic marker correlation analysis

We further analyzed the expression differences of APOA1, PLN,PRKCZ, TRPV2 between the GSE10558 testis sample and the ovarian sample group, and displayed them in a box diagram (Figure 9), and the results were significantly different. On this basis, the ROC curve was used to verify the diagnostic efficiency, APOC4 - AUC = 1, PLN -AUC = 1, PRKCZ-AUC = 0.938, TRPV2-AUC = 0.938 (Figure 10).AUC >0.7 indicates higher diagnostic efficiency, and the results indicate that the selected diagnostic markers have higher diagnostic value.

Figure 8:Gene set variation analysis results of different reference genomes.(A) GSVA enrichment difference heat map in Biocarta database between GSE10558 testis sample and ovarian sample group; (B) GSVA enrichment difference heat map in KEGG database between GSE10558 testis sample and ovarian sample group; (C) GSE10558 testis sample Heat map of GSVA enrichment difference in PID database between GSE10558 and ovarian sample group; (D) Heat map of GSVA enrichment difference in Reactome database between GSE10558 testis sample and ovarian sample group.

Figure 9: Differences in the expression of diagnostic markers among different groups. (A) Box plot of the expression difference of APOA1 between the testis sample and the ovarian sample group; (B) Box plot of the expression difference of PLN between the testis sample and the ovarian sample group; (C) PRKCZ between the testis sample and the ovarian sample group (D) Box plot of the expression difference of TRPV2 between the testis sample and the ovarian sample group.

Figure 10:ROC curve verifies the diagnostic efficiency. (A) ROC curve of APOA1 between testis sample and ovarian sample group, AUC=1;(B) ROC curve of PLN between testis sample and ovarian sample group, AUC=1; (C) PRKCZ between testis sample and ovarian sample ROC curve between groups, AUC=0.938; (D) ROC curve of TRPV2 between testicular sample and ovarian sample group, AUC=0.938.

Discussion

With the increasing incidence of PPD, understanding the pathology and molecular mechanisms of PPD is essential for clinical diagnosis and treatment. There is increasing evidence that perinatal depressive symptoms are related to adverse maternal and infant health outcomes.However, the etiology associated with this type of depressive psychopathology is poorly understood, but understanding the differences between individuals may provide insights into the genomic regions that are responsible for this type of depressive psychopathology [19]. Efficient high-throughput sequencing and bioinformatics analysis help us understand the molecular mechanisms of disease occurrence and development, which are necessary for exploring genetic changes and identifying potential diagnostic biomarkers. Therefore, in this study, we analyzed the mRNA expression profile matrix data through bioinformatics methods. First,we identified 222 DEGs with statistically significant differences in the GSE10558 data set, of which 18 DEGs have significant differences. GO analysis showed that most of the 18 significantly differentially expressed genes were rich in receptor ligand activity and cytokine receptor binding. It is worth noting that these genes are also enriched in functional areas related to immune inflammatory response and immune cell regulation. Studies have shown that the immune system may be involved in the pathogenesis of PPD. The immune phenotype analysis of immune cell disorders and disturbances in the immune subgroups of PPD patients reveals internal changes that lead to inflammatory tendencies. The center of the immune mediator is involved in the breakdown of the blood-brain barrier, which results in the presence of complement proteins[20,21],inflammatory activation[11,22-24], infiltration of CD4+ and CD8+ T lymphocytes and PPD patients with autoantibodies to neuronal antigens in the substantia nigra [25-27]. However, the precise molecular mechanism of the death of dopaminergic neurons through the immune pathway is still unclear.

In order to study the biological functions of DEGs related to PPD,the GSVA package was used for GSVA analysis, and common activation and inhibition signal pathways were selected from GEO10558. The results show that these genes are significantly enriched in growth factor binding and other aspects. In this study, by mining the phenotypic gene set, the DEGs that are significantly related to PPD were intersected, and four overlapping genes APOA1, PLN,PRKCZ, and TRPV2 were obtained as the most important pivot genes.CIBERSORT has been widely used to evaluate the relative content and dynamic regulation process of 22 immune cells. In our research, we obtained immune clustering groupings based on ssGSEA analysis.DEGs found in high, medium, and low immune score groups were mainly enriched in immune inflammatory response and immune cell regulation through GO analysis. We found that memory B cells of 22 immune cells were significantly different between the PPD group and the normal control group. We analyzed the four APOA1, PLN, PRKCZ,and TRPV2 that are significantly related to PPD through ROC curves and showed significant prediction accuracy, and all AUCs are above 0.9, indicating that these new biomarkers can be further studied in PPD.

Subsequently, we also showed the expression differences between tissue samples through box plots. The results show that there are significant differences in expression of these genes between groups,which may serve as new potential targets for the diagnosis and treatment of PPD. It can be seen that the APOA1, PLN, PRKCZ, and TRPV2 discovered in this study may play an important role in the occurrence and development of PPD through immune inflammatory response.

Although this study has achieved some new results in the diagnosis and differentiation of PPD, there are still some shortcomings. A single data set may lead to high false positive rates and one-sided results.Second, our data comes from public databases, and the sample size is too small to assess the quality of the data. In addition, we only identified a few molecules most related to PPD in mouse samples.Whether they are related to the course and severity of the disease needs further verification. At the same time, animal and cell experiments are needed to further prove these results.

Conclusion

These results highlight the important role of diagnostic markers in the pathogenesis of PPD, and their expression is markedly heterogeneous among PPD patients. Based on immune signatures, we identified three HCC subtypes, immune high (Immunity_H), immune medium(Immunity_M),and immune low(Immunity_L),and confirmed that the classification is reliable and predictable. DEGs between groups were enriched in immune-related functions and pathways such as immune inflammatory response and immune cell regulation. Immune cell-based immune scores also suggested significant distributional differences in samples with abnormal ovarian function. Through further molecular interaction and molecular diagnostic efficacy verification, we found that the molecular markers APOA1, PLN,PRKCZ and TRPV2, which are closely related to immune cell function,can efficiently identify PPD. indicate that this new biomarker panel can be further developed in PPD research. A diagnostic prediction model composed of these four immune function-related genes can distinguish PPD patients with different immune status. This discovery contributes to a more comprehensive understanding of the molecular mechanisms driving the occurrence and development of PPD, which is critical for improving the diagnosis, prognosis and treatment of this disease.