APP下载

Validity and Redundancy of Spectral Data in the Detection Algorithm of Sucrose-Doped Content in Tea

2022-11-07LIUMengxuanWUQiongWANGXuquanCHENQiZHANGYonggangHUANGSongleiFANGJiaxiong

光谱学与光谱分析 2022年11期

LIU Meng-xuan, WU Qiong, WANG Xu-quan, CHEN Qi,ZHANG Yong-gang, HUANG Song-lei*, FANG Jia-xiong*

1. State Key Laboratories of Transducer Technology, Shanghai Institute of Technical Physics,Chinese Academy of Sciences, Shanghai 200083, China 2. Key Laboratory of Infrared Imaging Materials and Detectors, Shanghai Institute of Technical Physics,Chinese Academy of Sciences, Shanghai 200083, China 3. ShanghaiTech University, Shanghai 201210, China 4. University of Chinese Academy of Sciences, Beijing 100049, China 5. Technology Center of Hefei Customs District, Hefei 245000, China

Abstract Near-infrared spectroscopy (NIRS) technology integrated with Genetic Algorithm-Back Propagation (GA-BP) neural network was used to spectral sucrose-doped content in 162 tea samples in the NIR wavelength range of 1~2.5 μm. The parameters of the GA and BP neural network were optimized by the sample set to analyze the validity and redundancy of spectral bands. The raw data in the range of 1~2.5 μm was divided into 1~1.7, 1~1.3, 1.3~1.7, 1.7~2.5 and 2~2.2 μm sets. The established quantitative detection model was used to conduct model training on different wavelength bands at the same resolution. The prediction results show that, for the target content, data redundancy appears in both 1~1.7 and 1~2.5 μm bands. The model could be effectively extracted using only 1.3~1.7 or 1.7~2.5 μm band. The prediction model was also conducted using different spectral resolutions from 2 to 20 nm in the same band. In the wavelength range of 1~2.5 μm, the R was between 0.9 and 0.95 when the RMSEP ranged from 1.7 to 2.1. While in the wavelength range of 1~1.7 μm, the R was in the range of 0.9 to 0.93 when the RMSEP was between 1.95 and 2.25. The results indicate that, for the target content, redundancy exists in the 1~2.5 and 1~1.7 μm bands on both wavelength range and spectral resolution. Through the analysis of spectral features and modeling of the algorithm, the effectiveness of spectral data acquirement could be improved dramatically; for the detection of sucrose-doped content in tea, a much narrower wavelength range and lower spectral resolution could be adopted.

Keywords Genetic algorithm; BP neural network; Near-infrared spectroscopy; Validity; Tea

Introduction

Tea is one of the world’s most popular beverages, with a special flavor and high nutritional value. For hundreds of years, tea consumption has been expanding worldwide, leading to the frequent occurrence of adulteration[1]. In particular, there is artificial adulteration of sucrose in exported green tea. Thus, it is necessary to detect the sucrose-doped content in tea. Up to the present, sensory evaluation and wet chemical analysis are still commonly used for judging whether the tea is artificially mixed with sucrose. However, both methods have disadvantages. Sensory evaluation is easily affected by many factors, such as environmental variation and personal subjectivity. It lacks reproducibility and fairness[2]. The wet chemical analysis relies on precision instruments, such as liquid chromatography-mass spectrometry and high-performance liquid chromatography[3]. Nevertheless, these methods are high cost, time-consuming and labor-intensive. Hence, developing and implementing rapid and low-cost methods would be highly beneficial to tea industries and regulatory bodies.

NIRS is a rapid, nondestructive and large scale inspection method as a green analysis technology. Combining it with suitable chemometrics methods has been used to establish prediction models for tea categories, grades, and content of different ingredients[4-11]. Until now, there are few studies on the detection of sucrose-doped content in tea. The GA-BP neural network has the advantages of strong linear learning ability, strong feature extraction ability and strong model expression ability. The feature information of the target component in the NIR spectrum of multi-component substances can be extracted by this algorithm. Therefore, the objective of the current study is to explore the application of NIRS and the GA-BP neural network in detecting sucrose-doped content in tea.

The NIR spectrometer with high resolution and wide wavelength range contain more information and noise. To avoid losing characteristic information, all spectral data is used to build a predictive model. However, this method may introduce more noise and have data redundancy, which cannot make the model prediction effect the best[12]. To optimize the prediction results and reduce detection cost, it is necessary to study the redundancy of the full spectral range modeling in terms of wavelength range and resolution.

The methods that can be used to study spectral band redundancy has two ways. One is to divide the spectral band based on the wavelength range of the portable NIR spectrometer. Then, study the spectral band redundancy and resolution effects. The other is to select the band according to the characteristic band interval of the target substance with less interference from other components to build the model. Both methods were adopted to build a high predictability detection model to explore the validity and redundancy of spectral data.

This paper used162 samples of tea mixed with sucrose, whose spectrum was collected by an FT-NIR spectrometer. A GA-BP neural network model was applied to analyze the validity and redundancy at different spectral bands and resolutions. Moreover, a further study about whether a NIR spectrometer with a narrow wavelength range and lower resolution has the potential to detect the sucrose-doped content in tea is also qualified.

1 Experiment

1.1 Experimental Materials

A total of 162 experimental samples were from Huangshan export green tea, which was prepared by GB/T 8302—2013[13]to ensure consistency, while NIR spectra were measured by FT-NIR. The measurement model of diffuse reflectance absorbance was adopted. The scanning wavenumber range is 4 000~10 000 cm-1. The wavenumber interval is set to be 0.48 cm-1. The standard sucrose-doped content of the samples was in the range of 0.91%~22.6%, which was measured by high-performance liquid chromatography, according to the GB 5009.8—2016[14]standard.

1.2 Research methods

The original spectrum in the 1~2.5 μm was preprocessed by multivariate scattering correction[1]. Considering that the wavelength range of common handheld spectrometers adopting InGaAs device is normally in about 1~1.7 or 1.7~2.5 μm range with lower resolution, the raw data was divided into 1~1.7 and 1.7~2.5 μm bands. It could be seen from the NIR spectrum of the sample that there was a characteristic peak around 1.4 μm, respectively. However, there may also be the influence of moisture around 1.4 μm. In order to easily distinguish whether the characteristic peak was the interference of the target content or other components and to further study the redundancy of the 1~1.7 μm band, the 1~1.7 μm band was divided into 1~1.3 and 1.3~1.7 μm. The spectrum of tea samples contains other mixed substances, especially moisture. For the second band, the characteristic range band of 2~2.2 μm was selected by relative value analysis. The entire research schematic diagram is shown in Fig.1. The investigation includes the effectiveness of each spectral band under the same resolution and spectral resolution effects of the same wavelength range by the GA-BP neural network quantitative detection model. The 162 experimental samples were divided into the training set and prediction set approximately at the ratio of 3∶1, of which 120 samples were randomly used for model training, and the remaining 42 samples were only used to evaluate the model prediction results. Predictability evaluations of the detection model were based on correlation coefficient (R), and root mean square error of prediction (RMSEP).

Fig.1 Schematic diagram of analysis of different spectral bands and resolution

Fig.2 Parameter selection of genetic algorithm (a):Iteration parameters; (b):Population size; (c):Mutation probability; (d):Crossover probability

1.3 Model parameters setting

This study adopted the GA-BP neural network algorithm[16]to establish a quantitative model of sucrose-doped content in tea. Some parameters of the algorithm would affect the prediction effect of the model, which needed to be determined based on the sample set, including iteration parameters, population size, crossover probability, mutation probability in GA, as well as epochs, training target error, learning rate, training function, and node transfer function in the BP neural network.

2 Results and discussions

2.1 Parameters optimization selection

The parameter selection criterion was theRand RMSEP between the predicted value and the standard value of the 42 prediction samples which did not participate in the training. Fora better prediction effect of the model, largerRand smaller RMSEP were needed. When one parameter was changed, the other parameters and the sample set remained the same. The final result is shown in Fig.2.

It could be seen from Fig.2 that the iteration parameter of GA was 60. The population size was 30. The mutation probability was 0.003 5. The crossover probability was 0.99.

The BP neural network training algorithm mainly included the gradient descent method, quasi-Newton algorithm, L-M algorithm, and Bayesian regularization algorithm, which was related to the training set, the complexity of the research object, and the size of the network. Some representative training algorithms were selected to test.

Table 1 Test results of different training functions

As shown in Table 1,the training function was trainbr.

The BP neural network had different node transfer functions, which contains three main types: logsig, tansig, and purelin. The different combination of the hidden layer and output layer node transfer functions would affect the model prediction result. The test results are shown in Table 2.

TheR, RMSEP, and the model training time were comprehensively considered. The node transfer function of the hidden layer and the output layer of the neural network were purelin and tansig.

Other parameters in the BP neural network were determined by model training. The number of neurons in the hidden layer was 16. The epochs were 100. The learning rate was 0.01. The training target error was 0.000 001.

Table 2 Test results of different node transfer function

2.2 Spectral bands analysis results

A GA-BP neural network model was established after optimizing the parameters. The resolution of different spectral bands was 2 nm. The training set and prediction set were randomly selected for multiple testing. Record theRand RMSEP respectively. The average value of 100 times was used as the final value. The prediction results of each spectral band are shown in Fig.3.

Fig.3 Model prediction results of differentspectral bands at the same resolution

Fig. 3 showed that the R of the 1~1.3 μm spectral band was the smallest, and RMSEP was the largest. It demonstrated that the 1~1.3 μm spectral band could not be used for quantitative detecting sucrose-doped content in tea. The prediction results of the models in the 1~2.5, 1~1.7, 1.3~1.7 and 1.7~2.5 μm sets indicated that these sets could be used alone to establish the sucrose-doped content detection model. The differences between the model prediction results of the four sets were minimal. Through analyzing the difference among the model prediction results in the 1~1.3, 1.3~1.7 and 1~1.7 μm sets, the 1~1.7 μm spectral band was less effective and had redundancy. The 1~1.3 μm spectral band was invalid, which negatively impacted the model and decreased model accuracy. Comparing the results in the 1.7~2.5 and 2~2.2 μm bands, the 2~2.2 μm band could be used to establish the model, while its accuracy needed to be improved.

2.3 Spectral resolution analysis results

By averaging point values, change the resolution of the spectral band in 1~2.5 μm from 2 to 20 nm by averaging point values. The model was trained 100 times at each resolution and recorded the average value, as shown in Fig.4.

Fig.4 Model prediction results of different resolution at 1~2.5 μm spectral band

Fig.4 showed thatRranged from 0.9 to 0.95, and RMSEP was 1.7 to 2.1 at the different spectral resolutions. To further study the application of the portable NIR spectrometer, the 1~1.7 μm spectral band data was also used for the resolution experiment. Similarly, change the spectral resolution from 2 to 10 nm. The experimental results showed that the R was in the interval of 0.9~0.93, and RMSEP was between 1.95 and 2.25. The model prediction results of the two spectral bands indicated that the resolution had little effect on the detection model of sucrose-doped content in tea. However, the model results at low resolution were better than those at high resolution. The phenomenon may be differences in tea samples, spectral acquisition error, excessive noise in the raw data, neural network over fitting, and the method of altering the spectral resolution.

3 Conclusions

This paper mainly analyzed the validity and redundancy of spectral data by the GA-BP neural network detection algorithm of sucrose-doped content in tea. Analyze the different spectral bands at the same resolution. The prediction results showed that 1~2.5, 1~1.7, 1.3~1.7, 1.7~2.5 and 2~2.2 μm spectral bands could be used to establish a detection model, and the modeling effect of 1.3~1.7 μm was better, which conformed to the wavelength range of the portable NIR spectrometer. Analyze the different spectral resolutions at the same band. The prediction results indicated that the resolution had little effect on the model and the spectral resolution of 10~20 nm was enough for the portable NIR spectrometer. Through the analysis of different spectral bands and resolutions, the redundancy exists in 1.3~1.7 and 1~2.5 μm on both wavelength range and spectral resolution. It is of great significance to further explore the application of low spectral resolution portable NIR spectrometer in tea.