Genetic diversity and population structure of Commelina communis in China based on simple sequence repeat markers

2018-11-13YANGJuanYUHaiyanLIXiangjuDONGJingao

Journal of Integrative Agriculture 2018年10期

YANG Juan , YU Hai-yan, LI Xiang-ju, DONG Jin-gao

Abstract Commelina communis (Asiatic dayflower) is a troublesome weed in China. Genetic variation of 46 C. communis populations from different collection sites in our country was investigated using 12 simple sequence repeat (SSR) primer pairs.Polymorphism analysis results showed high level of genetic diversity among these populations. The alleles (bands) were amplified by these primer pairs. The polymorphic proportion was 18.25%, and the average polymorphism information content was 0.1330. The highest effective number of alleles was 1.9915 at locus YP33, and the lowest value was 1.0000 at both loci YP25 and YP31. C. communis showed major average observed heterozygosity value (0.8655) than that of average expected heterozygosity (0.1330). C. communis populations were divided into three groups on the basis of unweighted pair-group method with arithmetic mean cluster analysis (Dice genetic similarity coefficient=0.772) and genetic structure analysis (K=3), and a principal coordinate analysis. The results of this study further illustrated that C. communis populations contained abundant genetic information, and the 12 SSR markers could detect the microsatellite loci of C. communis genomic DNA. These results might indicate that C. communis maintains high genetic diversity among different populations.

Keywords: Commelina communis, UPGMA, population structure, principal component analysis, microsatellite

1. Introduction

Commelina communis, commonly known as the Asiatic dayflower or bamboo leaf weed, is an annual to perennial,monocot plant belonging to the Commelinaceae family. It germinates in late spring and mainly affects the growth of maize, soybean, wheat, vegetables, and other crops (Li 1998; Mishra et al. 2002). For example, 98% of maize growers and soybean growers in Northeast China rank C. communis as the top of three troublesome weeds in recent years because of the succession of weed community of the fields (Ma et al. 2010).

C. communis is native to temperate Northeast Asia.Moreover, it is cosmopolitan and has been broadly distributed all over the world, especially in the Northern Hemisphere (Santiago and Micheal 2009). The phenotypic plasticity of C. communis allows itself to maintain dominance in some wicked environments, so that it has a wide ecological tolerance to climatic and soil factors (Kutbay and Uckan 1998). A few reasons underlie the serious abundance of C. communis. On one hand, C. communis has an aggressive growth habit, creeping along the soil and rooting adventitiously at the nodes, thus increasing the potential for survival (Huang et al. 2000; Pyšek 2001; Li et al. 2008; Santiago and Micheal 2009). On the other hand,C. communis is an out-crossing species. That is, a single plant can produce 5–80 bisexual flowers in each life cycle(Li et al. 2016). Additionally, C. communis can reproduce by both vegetative reproduction and seed propagation,and like other Commelinaceae species, C. communis is known for its opportunistic germination throughout the growing season (Prostko and Webster 2007; Gómez-Vargas 2012). Furthermore, C. communis is tolerant to most herbicides popularly used in maize and soybean field such as imazethapyr, acetochlor, fluroxypyr, glyphosate, etc.(Ma et al. 2009; Santiago and Micheal 2009). The control efficacy of a herbicide varied on C. communis populations in different locations, causing this species escaping from herbicide treatments in some areas. At present, farmers in China are mainly focused on increasing the dosages of herbicides in the management of C. communis without knowing the diversity of populations, bring environment pollution and the carry over to the subsequent crops(Santiago and Micheal 2009).

Genetic variation within local populations and shared among populations in a geographic area determines the potential for evolutionary adaptation to changing environmental conditions. The potential for weed populations which was adapted to management practices is dependent on population genetic variation. Thus, knowledge of weed genetic diversity is essential for designing effective long-term weed management programs.

Simple sequence repeats (SSRs), also known as microsatellites, are shortly repeated DNA sequences generally with 1–6 base pairs (the majority has 2–4 base pairs) in length per unit (Liu et al. 2016). SSR is distributed extensively throughout the genomic DNA. In addition, SSR has several advantages, including typically codominant,highly polymorphic, reproducible, and highly accessible(Zane et al. 2002; Selkoe and Toonen 2006). Studies on genetic diversity and genetic structure based on SSR were conducted on plants, mushrooms, animals, and other organisms. For example, a total of 13 highly polymorphic and transferable cross-species SSR primers were evaluated for genetic diversity in Solanum elaeagnifolium populations,these primers laid the foundation for studying the extent of genetic diversity in S. elaeagnifolium populations (Zhu et al. 2012). The phylogenetic relationship of Echinochloa species was also studied based on 23 SSR primer pairs and 24 morphological traits successfully (Lee et al. 2016).

In this study, the genetic diversity of C. communis populations was analyzed using SSR fluorescent markers.The specific objective of this study was to quantify the genetic diversity of C. comminus, which may serve as potential information between the correlating gene diversity parameters and herbicide use patterns.

2. Materials and methods

2.1. Plant materials

Samples of C. communis representing populations were collected from 8 provinces in China, mainly from corn fields through “Z” sampling or from roadsides by randomly sampling (Table 1). A total of 20 plants were selected from each population for analysis after one year vegetative reproduction.

2.2. DNA extraction

C. communis genomic DNA was extracted from fresh leaves using the Tiangen DNA Extraction Kit (Tiangen Biotech(Beijing) Co., Ltd., China). The quality and concentration of DNA were detected using 1% agarose and Nano Drop 2000(Thermo Scientific, USA), respectively. Furthermore, the concentration was diluted to 30–50 ng µL–1using TE buffer.

2.3. SSR primer screening and PCR amplification

A total of 34 SSR primers derived from the C. communis genome were used in this study (Li et al. 2015). The PCR reaction system was carried out in a 20-µL volume containing 1 µL of 30–50 ng genomic DNA, 0.5 µL or 0.6 µL of forward primer and reverse primer (1.0×10–8mol L–1), 10 µL 2×Taq PCR Master Mix (0.1 U Taq polymeraseµL–1, 5.0×10–4mol L–1dNTP each, 2.0×10–2mol L–1Tris-HCl(pH 8.3), 0.1 mol L–1KCl, 3.0×10–3mol L–1MgCl2; Tiangen,Beijing, China). The PCR amplification conditions were as follows: 95°C for 3 min, 94°C for 30 s, annealing temperature for 30 s, 72°C for 30 s or 1 min, 35 cycles from steps 2 to 4,and final extension steps at 72°C for 15 or 30 min. The SSR primers with specific amplification and high polymorphism were chosen as candidate makers from a total of 34 SSR primers derived from C. communis reported in reference (Li et al. 2015). Among these SSR markers, the 12 primers of polymorphisms were screened and labelled with fluorescent dyes 6-FAM, or HEX by Invitrogen Biotechnology Co.,Ltd. (Shanghai, China). The annealing temperature was optimized in accordance with each specific primer. The high polymorphism primer sequences and loci in assembled scaffolds are shown in Table 2.

Table 1 Information of Commelina communis populations

2.4. Detection of PCR products

PCR products were detected using 1.8% agarose gel electrophoresis, then, they were distinguished with primer dimers. Primers that displayed polymorphism were screened and labeled with different fluorescent dyes. Thesefluorescent markers were used to detect the specific loci of C. communis populations. The HUM-STR method was applied for electrophoretic analysis (capillary temperature 60°C; sample injection 2.0 kV, 30 s; electrophoresis 4.8 kV, 65 min).

Table 2 Characteristics of 12 simple sequence repeat (SSR) markers

2.5. Analysis of genetic diversity

The amplified production using fluorescent SSR primers were detected by ABI PRISM 3730xl DNA Sequencer with GS500 (Applied Biosystems, USA) as an internal size standard. The allelic sizes were determined using GeneMarker version 2.2.0 (Applied Biosystems). The effective allelic sizes were transformed into a [0, 1] matrix by CEQ fragment analysis software (Beckman Coulter, USA).

To evaluate the genetic diversity and genotypes of C. communis populations, we used the program POPGENE32 (version 1.31) to calculate the following parameters: observed number of alleles (Na) per locus,effective number (Ne) of alleles per locus, expected heterozygosity (He), observed heterozygosity (Ho), and Shannon’s information index (I) (Nei 1978; Francis et al.1999). The Dice genetic similarity coefficient values were used to describe the genetic relationship of populations by the unweighted pair-group method with arithmetic averaging (UPGMA) cluster analysis of the NTsys-pc software (version 2.10s) (Rolf 1998). The analysis of molecular variance (AMOVA) in “among groups”, “among populations” and “within populations” values were estimated by GenALEx software (version 6.503) through 999 data replications (http://www.anu.edu.au/BoZo/GenAlEx).The admixture model and the option of correlated allele frequencies were used to analyze the genetic structure of the population by software STRUCTURE 2.3.4 (Falush et al. 2003; Pritchard et al. 2007). The parameters Burnin Period and Markov Chain Monte Carlo (MCMC) were set as 5 000. The value of the Ln (likelihood) was calculated,in which K changed from 1 to 10. Each K-value was run thrice, and the final result was the average value (Song et al. 2003). Additionally, principal coordinate analysis(PCoA) was conducted using the Dice genetic distance matrix and the Dcenter, Eigen, and Mxplot proceures of NTsys-pc software (version 2.10s).

3. Results

3.1. Polymorphism analysis of fluorescent SSR markers

The genomic DNAs of 46 C. communis populations were amplified using 12 SSR fluorescent primers (Table 3).Results showed that a total of 652 bands were amplified,containing 119 polymorphism bands. The number of bands varied from 6 (YP19, YP20, YP31, and YP35) to 16 (YP26),with an average of 10 bands (alleles) per primer. The polymorphic band proportion (PBP) ranged from 8.99%(YP33) to 32.61% (YP25), and the average PBP was 18.25%. The polymorphism information content (PIC) for each maker was less than 0.5, with an average value of 0.1330. The highest PIC was 0.4978 (YP33).

The allele sizes, Ne, He, Ho, and I of 12 SSR primers in all C. communis populations are shown in Table 4. A total of 94 alleles ranging in size from 68 to 307 bp were detected by CEQ fragment analysis software (Beckman Coulter, USA). The value of He in different loci varied from 0.0000 (loci YP25 and YP31) to 0.4979 (locus YP33) with an average of 0.1330. The highest value of Ne was 1.9915 at locus P33, and the lowest value was 1.0000 at loci YP25 and YP31. Nevertheless, the Ho value in different loci variedfrom 0.4967 (locus YP33) to 1.0000 (loci YP25 and YP31)with an average of 0.8655. The I and Nei’s indexes, ranging from 0.0000 to 0.6910 and from 0.0000 to 0.4674, achieved an average value of 0.2266 and 0.0915, respectively. These results indicate that primer YP33 exhibited more significant polymorphism at locus in C. communis genomic DNA.

Table 3 PCR amplified results of 12 simple sequence repeat(SSR) markers1)

3.2. Cluster analysis

The Dice similarity coefficient was calculated using NTsys-pc software (version 2.10s) on the basis of SSR markers. The genetic similarity coefficient ranged from 0.760 to 0.946. A dendrogram based on SSR data was generated (Fig. 1).Furthermore, the genetic relationships were elucidated by the dendrogram. On the basis of the cluster analysis,the 46 C. communis populations could be divided into three major groups (I–III) when the Dice genetic similarity coefficient was 0.772. Group I clustered the five populations from Heilongjiang Province, displaying their similar genetic backgrounds. Group II clustered 33 populations with C. communis from Jilin, Liaoning, Hebei, Jiangsu, and Zhejiang provinces. The populations collected from Hubei and Guizhou provinces was clustered in group III. These results illustrated that clustered populations in the same group had similar genetic background. There is a certain regional specialization in the distribution of C. communis populations. The closer the geographical location, the higher value the genetic similarity coefficient is.

AMOVA values showed that a high partition of the variation is attributed to “among populations”. It was showed that 29% of the total variation came from groups clustered by UPGMA method. A total of 51% of the total variation was due to differences among populations, nevertheless, 20%was attributed to the individuals within populations (Table 5).There were 3 groups based on UPGMA method according to geographical location and latitude from northeast to south in China. These populations which collected from Heilongjiang Province clustered together in group I, and Heilongjiang Province was a higher latitude area of our country. Jilin,Liaoning, Hebei, Jiangsu and Zhejiang provinces were located in the middle of China, and 33 populations from these provinces clustered group II. Additionally, the group III contained 8 populations collected from Hubei and Guizhou provinces, where were lower latitudes in China.

Table 4 Genetic variation of microsatellite loci1)

Fig. 1 Dendrogram of Commelina communis populations based on simple sequence repeat (SSR) data using the unweighted pair-group method with arithmetic averaging (UPGMA) method (codes as in Table 1).

Table 5 Analysis of molecular variance (AMOVA) among Commelina communis populations and clustered groups1)

3.3. Genetic structure analysis

We first selected the optimum K-value (Fig. 2-A) to estimate the population genetic structure of C. communis. Fig. 2-A showed that a significant turning point and the maximum likelihood values appeared at K=3 (ΔK=m(|L´´K)|)/s[L(K)])(Evanno et al. 2005). Therefore, a genetic structure analysis was shown in Fig. 2-B. All populations were divided into three groups (I–III).

Generally speaking, genetic structure analysis also revealed similar relationships with cluster analysis (Fig. 2-B).When component probabilities exceed 0.6, group I on the basis of structural analysis achieved a high similarity to clustered group I. Moreover, the populations were distributed in group II as the population distribution in the clustering group II. Furthermore, group III showed a close relationship with the population distribution in the clustering group III.

3.4. Principal component analysis

Principal component analysis was carried out on 46 populations on the basis of the SSR data matrix (Fig. 3). The variances of the first and second principal components were 80.71 and 2.44%, respectively, accounting for 83.15% of the total variation. The 2D scatterplot of 46 populations was produced on the basis of the first two principal components.Furthermore, the positions where the close relationship indicated with each other were intensive. By contrast, the relationship was alienated.

Fig. 2 Population structure model of 46 Commelina communis populations. A, group numbers of C. communis populations based on the mapping method. The abscissa is the K value (K=1, 2, 3, …, 10). When K=3, a significant turning point and maximum likelihood in the scatterplot are achieved. B, population structure diagram of 46 C. communis populations. The proportion of each color shows the probability of being divided into the corresponding group for each variety (groups I–III) (codes as in Table 1).

Fig. 3 2D scatterplot based on the first and second principal components of 46 Commelina communis populations (codes as in Table 1).

Principal component analysis divided the C. communis populations into three groups (Fig. 3). Group I, which originates from Heilongjiang Province, was involved with a closer relations hip than other groups. Group II included the populations from Jilin, Liaoning, Hebei, Jiangsu, and Zhejiang provinces. C. communis populations from Hubei Province and Guizhou Province were clustered in group III. These results indicated that the principal component analysis could directly reveal the genetic relationship among different C. communis populations.

4. Discussion

Capillary electrophoresis is a fast-developing molecular marker technology in recent years. This technology holds the advantage of having a fast, efficient, and highly automated process. It has become a hot spot and focus in the field of life science research. Many studies showed that capillary electrophoresis could provide more comprehensive and accurate information on length polymorphism and sequence polymorphism in DNA fragment polymorphism analysis than those provided by existing technologies. Liang et al.(2015) reported that germplasm resources, genetic diversity,and relationships among 133 persimmon populations from China, Japan, and America were analysed using the capillary electrophoresis method, and in the process, 158 alleles and 610 genotypes from 17 SSR loci were obtained.In our study, the genetic diversity of these C. communis populations was analysed by capillary electrophoresis and 12 SSR primers with highly polymorphic, nuclear, singlelocus, and codominant makers were selected from 34 SSRs developed initially for C. communis to thoroughly assess the genetic diversity and population structure of the plants from different locations (Li et al. 2015). The method could successfully distinguish the genotypes of C. communis used in the analysis of genetic diversity.

The genetic diversity and the degree of genetic differentiation were researched from the DNA level in ourstudy. A high degree of polymorphism among C. communis populations was found. The similar results were obtained through UPGMA cluster analysis, genetic structure analysis, and principal component analysis, in which 46 C. communis populations were divided into three major groups indicating that the each method could analyse the genetic diversity of C. communis populations. Li et al. (2016)found that geographical location played a more important role than soil composition on structuring genetic variation of pseudometallophyte among C. communis populations.In our research, AMOVA values showed that the “among populations” variation was the most major factor than other variation sources among C. communis populations genetic variation among three groups. As a result, our data further suggested that geographical location might play a more important role than those of other factors in these accessed populations. These results were consistent with previousstudies.

Li et al. (2016) reported that the metallicolous of C. communis populations with a higher number of private alleles usually showed a higher level of genetic variability.The PBP of C. communis populations was 8.99% by primer YP33 amplified, which was the highest value among all the primers (Table 3). Additionally, the genetic variation of microsatellite loci was analysed. As a result, primer YP33 achieved an increased effective detection. The YP33 locus gave the highest allele size (287–307 bp), the effective number of alleles (1.9915), expected heterozygosity(0.4979), Shannon’s information index (0.6910), and Nei’s gene diversity (04674). However, the observed heterozygosity was the lowest at 0.4967 (Table 4). These results might be related to C. communis genomic mutations.The similar results were shown in parasitic weeds. The genetic diversity of 50 Orobanche cumana populations using 15 SSR makers was assessed, and the result revealed low inter- and intra-population variability (Pineda-Martos et al.2013).

The dimensional genetic structure of C. communis might be determined by the gene flow. With food transport, the seeds and plant of C. communis were brought from one place to another place. In this study, we detected the shared ancestry in C. communis populations by STRUCTURE software. Several populations might share alleles with other populations. For example, group II contained 33 populations from Jilin, Liaoning, Hebei, Jiangsu, and Zhejiang provinces,and the distribution of these populations is wide, from northeast China to the Yangtze River. Therefore, the results of genetic structure suggested that regional specialization may substantially contribute to the high genetic diversity in the populations of C. communis. Studying the genetic diversity and exploring the genetic relationship and the specialization of C. communis in different regions will provide a theoretical basis for delaying the resistance of C. communis and managing C. communis.

Seed banks involved significantly large amounts of genetic diversity (Mandák et al. 2012). Each C. communis plant can produce 500–1 000 seeds in accordance with laboratory statistics. Mature C. communis seeds easily fall on the ground and buried into the soil. These seeds form a massive seed bank, maintaining genotypes over time and containing significantly large genetic diversity information.More than 80% of the seeds can germinate even after a few years in the soil seed bank (Takabayashi and Nakayama 1978). We are currently investigating the relationship of the C. communis seed bank with population genetic diversity and intend to report on this subject in the future.

5. Conclusion

The genetic diversity and population structure of 46 C. communis populations were successfully analyzed using 12 SSR fluorescent markers with capillary electropherogram.The mean number of polymorphism bands was 10, and the polymorphic proportion was 18.25%. The average value of PIC was 0.1330. The effective number of alleles was 1.9915 at locus YP33; this value was the highest. The He in different loci varied from 0.0000 to 0.4979, with an average of 0.1330. Meanwhile, the Ho in different loci varied from 0.4967 to 1.0000, with an average of 0.8655. The I and Nei’s indexes, ranging from 0.0000 to 0.6910 and from 0.0000 to 0.4674, respectively, possessed average values of 0.2266 and 0.0915, respectively. On the basis of UPGMA cluster analysis, genetic structure analysis, and principal component analysis, these populations were divided into three major groups.

Acknowledgements

This study was funded by the National Key Research and Development Program of China (2016YFD0300701)and the earmarked fund for China Agriculture Research System (CARS-25). The authors gratefully acknowledge Dr. Cui Hailan, Dr. Yu Huilin, Dr. Huang Hongjuan,Dr. Huang Zhaofeng, and Research Assistant Quan Zonghua (Institute of Plant Protection, Chinese Academy of Agricultural Sciences), Prof. Zhou Xiaogang (Institute of Plant Protection, Sichuan Academy of Agricultural Sciences, China), for their help in collecting plant materials and technical assistance. In addition, we also acknowledge the editors honestly.