Loading [MathJax]/jax/output/SVG/jax.js

Open-access Trace Determination of Hormones in Peptide Powders Based on Multi-Molecular Infrared Spectroscopic and Multivariate Data Fusion

Abstract

A direct, high-throughput and accurate method, multi-molecular infrared spectroscopic (MMIR) data fusion, was established for the detection of trace of estrone, diethylstilbestrol, betamethasone and prednisone acetate in peptide powders based on near-infrared and mid-infrared fusion datasets. Tri step MMIR with progressively improved resolution captured key fingerprint features of hormones in the regions of 3430-3250, 2950-2850, 1750-1550 and 1335-1050 cm-1 which were exclusively selected as the datasets. The back propagation neural network (BPNN) model with the best performance used a mid-level fusion strategy and achieved excellent root mean square error of prediction (RMSEP, 1.101 mg kg-1) and residual prediction deviation (RPD, 9.890). Compared to those of single dataset models, the RMSEP was reduced by more than 82.03%. This method had remarkable practical significance for the direct and accurate quality evaluation of peptide powder products with favorable limit of detection (LOD) but without pretreatment and chemical agents, verifying the feasibility of multiple analytical instruments for rapid and comprehensive detection of powdered foods.

Keywords:
multi-molecular infrared spectroscopy; data fusion; peptide powder; hormones


Introduction

A variety of substances, including oligopeptides and polypeptides, exhibit good biological activity in human metabolism, and their superior intestinal absorption makes them a convenient protein supplement.1,2 More importantly, peptides have numerous physiological functions, such as anti-cancer,3 anti-hypertensive,4 anti-cholesterol, immunomodulation,5 hormone regulation,6 and bacteriostasis,7 etc. However, with the influx of various peptide products into the market, undesirable phenomena such as shoddy products, fake activity, and illegal additions have been observed. Illegal chemical ingredients that are not desirable in “pure” foods can have a direct negative impact on the health of consumers. Inferior peptide powder products are adulterated with trace hormones to falsify or enhance the hormone regulation of peptides and improve their physiological function. It is crucial to detect trace hormones in peptide powder products.8

At present, the detection of peptide powder is generally complicated and cumbersome. As a complex system, a variety of food separation procedures, including precipitation, extraction, distillation, and chromatography, enable the separation of pure substances. In the past few decades, many chemical and biological technologies have been employed to analyze the quality of powdered foods and ingredients. Liquid chromatography-mass spectrometry (LC-MS) and gas chromatography-mass spectrometry (GC-MS) have become the most widely used methods for separation and identification.9,10 Although biochemical methods have been widely and effectively employed to evaluate the index parameters of food, they are labor-intensive and time-consuming techniques that are difficult to apply in rapid on-site inspections and routine quality control in factories.11

In response to the issue, Fourier transform infrared spectroscopy (FTIR) is a possible solution to consider. FTIR spectroscopy is a rapid and non-destructive molecular spectroscopy that can display various material information in mixed systems, including food, enabling the quick identification of samples through a unique analysis method without separation. However, due to the phenomenon of superposition and shift in the spectrum of each substance, various information is concealed. To overcome this limitation, multi-molecular infrared spectroscopy (MMIR) can achieve a step-by-step increase in resolution and collect various types of information. MMIR combines spectral analysis and chemometrics to achieve an increased resolution through one-dimensional infrared spectroscopy (IR), second derivative infrared spectroscopy (SD-IR), two dimensional correlation infrared spectroscopy (2DCOS-IR), hyperspectral imaging, and other methods.

One of the main methods, tri-step infrared spectroscopy, is an effective method for revealing the main components in complex mixtures and distinguish trace components in highly similar systems from a qualitative perspective. IR and SD-IR can accurately determine the band of functional groups and reveal the differences between different substances, 2DCOS-IR composed of IR changed by a single factor (concentration, temperature, substance, etc.) can amplify the differences between spectra when the factor changes. Usually, IR analysis can obtain an absorption peak of a chemical bond or functional group, but this technique can display multiple unique absorption peaks of the sample, forming several characteristic spectral fingerprints, namely macroscopic characteristic fingerprints. This technique is capable of obtaining the macroscopic characteristic fingerprints of specific molecules in complex sample systems and conducting comprehensive analysis of specific major and minor components in complex mixed systems through qualitative and quantitative models.12 MMIR can directly identify trace harmful substances in flour, and the accuracy of asynchronous MMIR based AlexNet model is 99.6%.13 2T-2DCOS directly displays the concentration change of emetic toxin in flour, and realizes 100% classification and recognition of flour with emetic toxin content less than 1.00 mg kg-1 by combining principal component analysis and the model built by support vector machine.14 MMIR also distinguishs a variety of pearl powder and identify adulteration with the characteristic spectra in the range of 1550-1350, 730-680 cm-1 and 830 880, 690-725 cm-1.15

Chemometrics integrates the principles and methods of mathematics, statistics, and computer technology to analyze chemical data. It extracts the key characteristic fingerprints of samples from chemical measurement data and links them with the original state of the sample, thereby solving descriptive and predictive problems in natural science. With the continuous emergence of multiple instruments, there has been an explosive growth of chemical analysis data. The structure of these data is similar to that in the field of economics. Therefore, using econometric data processing methods to deal with chemical problems has become a viable solution.16

The integration of the two approaches can effectively establish a food composition prediction model. Nevertheless, the accurate prediction of certain harmful substances that can affect the human body even in trace amounts poses a challenge to the accuracy of the model. This issue has become a focal point in the current academic community and cannot be overlooked.17 Different instruments provide varying degrees of accuracy in information acquisition, and integrating multiple information sources to improve the breadth and accuracy of detection is a viable strategy. Data fusion achieves this goal by carrying out multi-level and multi-dimensional analysis of various types of feature quantities. It performs automatic detection, correlation, estimation and combination, allows sensors to complement each other in time and space, and achieves more accurate prediction and description of experimental samples that are in line with current conditions.18,19 This strategy enhances the performance of the chemometric model, improves its reliability and accuracy, and allows for multi-dimensional analysis of the research object, thereby effectively improving the fault tolerance and adaptive ability of the model. The data fusion model constructed by tri-step infrared spectroscopy and electronic nose (E-nose) interpreted the characteristic fingerprint range of alcohol and sugar in red wine. Later, a more robust quantitative model was established through partial least squares (PLS), which allowed the wines to be batch screened on site.20 Data fusion can also be applied to the detection of trace heavy metals. In this study,21 their model shows that the data fusion model has the best evaluation indicators. Based on attenuated total reflectance Fourier transform infrared (ATR-FTIR) spectroscopy and paper spray mass spectrometry (PS-MS), the PLS model achieves better prediction in coffee through low-level and high-level fusion.22

Therefore, this study focused on the trace detection of hormones in protein peptides and aimed to establish a rapid, accurate, and field-oriented quantitative prediction model. In comparison with previous studies, this study emphasized the processing and efficient utilization of multiple data sources, reflecting a broader and more accurate level of test result that was able to evaluate food quality comprehensively and accurately from multiple dimensions. The contents of this study were as follows: (i) MMIR was used to elucidate the spectral recognition mechanism of hormones in protein peptides and identified the characteristic fingerprint region; (ii) a data fusion approach based on near-infrared full wavelength and mid-infrared fingerprint region was adopted to establish quantitative models using two regression model algorithms, namely, partial least squares regression (PLSR) and back propagation neural network (BPNN); (iii) high-precision quantitative prediction of target substances in complex systems of protein peptides was achieved. It is noteworthy that the fused quantitative model was compared with the single dataset quantitative model, and the fused model showed more accurate and robust prediction ability.

Experimental

Materials and chemicals

The raw materials of peptide powder samples were purchased from major supermarkets in Shanghai and have passed in the national production license and standard certification of China (GB 31645-2018, GB/T 22729-2008, GB/T 22492-2008, QB/T 4707-2014, QB/T 5298-2018 and QB/T 4588-2013), to ensured that they did not contain illegal additives such as hormones, flavors, and brighteners. Four common hormones were selected as chemical standard: estrone (CAS No. 53-16-7), diethylstilbestrol (CAS No. 56-53-1), betamethasone (CAS No. 378-44-9), prednisone acetate (CAS No. 125-10-0), all purchased from Sinopharm Chemical Reagent Co., Ltd. (Shanghai, China), Figure 1 and Table 1 detail the structural formula and label information of hormones. Sodium hydroxide, ammonium acetate, hydrochloric acid, high-performance liquid chromatography (HPLC)-grade methanol and acetonitrile (ACN) were procured from Aladdin Biochemical Technology Co., Ltd. (Shanghai, China).

Table 1
Information of chemical reference standards

Figure 1
Molecular structural formula of four hormones.

Apparatus

Spotlight 400 Fourier transform infrared spectrometer (FTIR, PerkinElmer, Waltham, Massachusetts, USA), equipped with a single-point diamond attenuated total reflection (ATR) accessory and a deuterated triglycine sulfate (DTGS) detector was used. Mid-infrared spectra (MIR) were collected using ATR-FTIR, and near-infrared spectra (NIR) were collected using diffuse reflection integrating sphere attachments (DR-S) at room temperature. The Waters 2695 HPLC was equipped with a ultraviolet (UV) detector PDA2998 (Milford, USA). Solid phase extraction cartridges were provided by Merck KGaA (Darmstadt, Germany).

Sample pretreatment

By comparing the label information of the selected peptide powder products, cod peptide powder was a food grade peptide powder, which had higher purity and less possibility of adding hormone forgery activity. High performance liquid chromatography (HPLC) verified that cod peptide powder did not contain the addition of estrone, diethylstilbestrol, betamethasone or prednisone acetate. Therefore, cod peptide powder was selected as the substrate, and estrone, diethylstilbestrol, betamethasone and prednisone acetate were mixed with cod peptide powder in proportion with a series of concentrations (0, 0.5, 1, 2, 5, 10, 20, 50 and 100 mg kg-1). Then, the prepared powder sample was placed in a thermostatic oscillator, mixed for 24 h at 300 rp min-1, and finally placed in a dryer at room temperature for testing. In order to ensured that the sample has been mixed evenly, three samples were taken at different positions for spectral collection, and the spectra compared to ensured the homogeneity of parallel samples. Hormone and non-hormone-added peptide powders can be directly acquired with ATR and DR-S without any pretreatment, respectively. In the pretreatment and sample testing process, in order to prevent environmental factors from affecting the test results, the temperature of the laboratory was maintained at 15-25 °C, the relative humidity was controlled at 20-40%, and the CO2 concentration was kept between 500 and 700 ppm.

HPLC analysis

HPLC was used to detect hormones in samples. A mixed solvent of methanol and ACN (1:1) was used to extract hormones from the sample. After centrifugation, NaOH was added to the supernatant for saponification reaction. Then, the pH of the solution was adjusted to approximately 6.5-7.5, and finally purified by solid phase extraction. Phase A consisted of 0.5 mmol L-1 of ammonium acetate in H2O, and phase B composed of methanol and ACN (1:1). Mobile phase was 4:6 (A:B) in the first 4.5 min, and in the next 2.5 min, it changed to 5:95 (A:B). In the end of 3 min, it returned to 4:6 (A:B). The flow rate was maintained at 0.3 mL min-1 at a controlled temperature of 22 °C.

IR acquisition and processing

Peptide powder without hormones, four hormones, and peptide powder with hormones were respectively collected by ATR-FTIR, with a scanning wavenumber range of 4000-600 cm-1. Similarly, the NIR spectrum was collected with the DR-S, with a scanning wavenumber range of 10000-4000 cm-1. During the whole spectrum acquisition process, air was used as the background to scan a cumulative 32 times. Each concentration of sample set included parallel group of 25 spectra, and the background was measured every hour.

In order to obtain comparable spectral data, MIR and NIR spectral data acquired above were corrected for nonlinear baseline deviations, including baseline correction, smoothing (13 points Savitzky-Golay polynomial fitting), and normalization (0-1), by Matlab.23 The average spectrum was also obtained using Matlab.23

SD-IR was obtained by derivation of preprocessed infrared spectra (number of data points 13) using Matlab.23 The pretreated IR was mathematically processed with Matlab23 to obtain 2DCOS-IR, including synchronous spectrum and asynchronous spectrum. The specific calculation formula was as follows:24,25

(1) Φ(v1,v2)=1m-1mi=1~A(v1,si)טA(v2,si)
(2) Ψ(v1,v2)=1m-1mi=1~A(v1,si)×mj=1Ni,j~A(v2,sj)
(3) Ni,j={0fori=j1π(j-i)otherwise

where Φ(ν12) represents synchronous spectrum; Ψ(ν12) represents asynchronous spectrum; m represents spectral quantity; A represents the dynamic spectra obtained from the mean-centered experimental dataset; ν1, ν2 represents spectral variables; si, sj represents additional perturbation variable; Ni,j represents an element of the Hilbert-Noda transformation matrix.

Construction of datasets and models

In order to compare the performance between different models in the field of quantitative prediction, quantitative model were constructed by classic linear regression model and neural network model: PLSR and BPNN. PLSR is a linear model in machine learning, and BPNN is a nonlinear model in deep learning. They are often used for food quality evaluation.26-28 The infrared spectrum follows Lambert Beer’s law, and the absorption strength of a substance at a certain wavelength of light is linearly related to the concentration of the absorbing substance, and the linear regression properties of PLSR are very consistent with the quantitative principles of infrared spectroscopy. PLSR determines the adjustment model mainly through its unique hyperparameter principal component or factor number in the process of training. BPNN is essentially the embedding of multi-layer functions. Although it cannot provide an intuitive relationship between data, it can theoretically fit any nonlinear function with multiple inputs and outputs, making it applicable in various fields such as regression. BPNN has many hyperparameters in the training process, such as the number of network layers, the number of neurons, and the learning rate, which will have a great impact on the prediction accuracy of the model.

Twenty-five infrared spectra were collected for each sample (25 for NIR and 25 for MIR). After integration and screening, 20 spectra were retained for each sample as training data. Standard normal variable transform (SNV) was performed before training each model. A number of 720 valid spectra were read for NIR and MIR using Matlab software23 to generate two matrices: NIR full wavelength, MIR full wavelength. After preprocessing the original spectra, the pretreated IR was mathematically processed with Matlab to obtain 2DCOS-IR and extract MIR fingerprint region. Low-level fusion dataset was obtained by performing matrix operations on NIR full wavelength and MIR fingerprint region. As for mid-level fusion dataset, principal component analysis (PCA) extracted the characteristic bands of NIR full wavelength and MIR fingerprint region, and then performed matrix operations on them. The input to the model were the above five matrices, and the output of the quantitative model was labeled with the hormone content in the peptide powder. Each dataset was randomly divided into calibration set and validation set model training. Then, calibration sets and validation sets were used to train quantitative models using the self-written PLSR and BPNN algorithm scripts.

Parameters and cross validation

The spectra of samples were divided into calibration and validation sets in 3:1 to maintain consistency in data distribution. Leave-one-out cross validation was used in model training.

For the PLSR model, the grid search method was used directly to determine the best model, and to avoid overfitting, latent variables (LV) were selected by L1 regularization and Halland and Thomas criterion. The BPNN model first determined the best network architecture, and then the grid search method was used to determine the best model.

Quantitative model evaluation

The evaluation parameters of the regression model were as follows: correlation coefficient of calibration set (Rc), root mean square error of calibration set (RMSEC), correlation coefficient of prediction set (Rp), root mean square error of prediction set (RMSEP) and residual prediction deviation (RPD).29 The calculation formula was as follows:

(4) R=ni=1(y1i-y1)(y2i-y2)ni=1(y1i-y1)2ni=1(y2i-y2)2
(5) RMSE=ni=1(y1i-y2i)2n
(6) RPD=ni=1(y1i-ˉy1)2nRMSE

where y1, y2 represents predictive and true values respectively, y1, y2 represents the average of predictive and true values respectively, n represents the number of samples.

A better quantitative model should have higher Rc, Rp, RPD values and lower RMSEC and RMSEP values. The higher RPD value, the greater the probability that the quantitative model will accurately predict the analyte in samples outside the calibration set. If the RPD > 3.0, the model has excellent robustness and high prediction accuracy, indicating that the model can be applied to actual production and detection; if 2.5 < RPD < 3.0, the model has good stability and high prediction accuracy; if 2.0 < RPD < 2.5, the prediction accuracy of the model is general; if RPD < 2.0, the model accuracy is poor and cannot be used.22

L1 regularization (least absolute shrinkage and selection operator, LASSO) was used to optimize the model, control its complexity, avoid overfitting, and reduce its sensitivity to noise. It added the sum of absolute values of parameters to the loss function. The λ was a penalty coefficien which was used to punished unimportant features in the model, and reduced their weights to zero. The calculation formula was as follows:

(7) ˆβLASSO=argmin{ni=1(yi-βo-kj=1xijβj)2+λkj=1|βj|}

where β^ is the LASSO estimation of unknown parameters in the regression equation, yi is the dependent variable (i = 1, 2,..., n), β0 and βj (j = 1, 2,..., k) shows the unknown parameters of the regression equation, xij indicates the explanatory or predictor variables and λ is the penalty coefficient of LASSO.

Limit of detection (LOD) based on multivariate models can be expressed by equations 8 and 9, the blank leverage h0 is a key ingredient, and had a range of values (h0min, h0max).30 It also generates a range for LOD.

(8) LODmin=3.3(bTbs2x+h0minbTbs2x+h0mins2ycal)1/2
(9) LODmax=3.3(bTbs2x+h0maxbTbs2x+h0maxs2ycal)1/2
(10) h0min=y2Ii=1(yi-y)2
(11) h0max=max{hi+h0min[1-(y-yy)2]}

where sx is instrumental noise level, sycal is uncertainty in calibration concentrations, b are vectors of sensitivity coefficients, h0 is blank leverage, h0min is minimum blank leverage, h0max is maximum blank leverage, y is the analyte concentration in calibration sample.

Results and Discussion

IR analysis

In this study, cod peptide powder was used as the base, and four hormones (estrone, diethylstilbestrol, betamethasone, prednisone acetate) were added to the peptide powder. By comparing the IR spectrum of peptide powder without hormones and the four hormone standards, it was observed that the IR of the four hormones and the peptide powder had unique characteristics, as shown by the position of absorption peaks, relative intensities, and peak areas (Figure 2a). Although the compositions were different, most of the absorption peaks in the spectra overlapped with those of the peptide powder. The group-peak matching method was used to analyze the overall characteristics of the infrared spectrum of each component.

Figure 2
Comparison of IR (a) and SD-IR (b) of protein peptide powder and four hormone standards.

The four hormones had vibration absorption peaks of -OH around 3430-3330 and 1290-1190 cm-1. But a wide -NH absorption peak of peptide powder around 3294 cm-1 covered the -OH absorption peak of the four hormones. Additionally, the four hormones also had telescopic vibration absorption peaks of -CH2 and -CH3 near 2942, 1332 and 2866 cm-1, respectively. At around 1716 cm-1, the three hormones (excluding diethylstilbestrol) had a stretch vibration absorption peak of C=O, which was absent in the peptide powder. For the absorption peaks around 1656 1580 cm-1, the four hormones also had slightly different benzene ring frame vibration absorption peaks in this range, which overlapped with the absorption peaks of the amide I and amide II. At around 1290-1190 cm-1 and 1050 cm-1, the four hormones had C-O telescopic vibration absorption peaks with differing intensities, whereas the absorption peaks of peptide powder in this range were relatively weak.

From the above MMIR spectral analysis, it was preliminarily inferred that the difference between the mixtures formed after adding different hormones to the peptide powder were not evident in the spectrogram. This could be attributed to the low amount of hormones added, and most of the absorption peaks in the trace hormones overlapped with the absorption peaks in the peptide powder. Hence, the characteristic absorption peaks of the trace hormones could not stand out from the absorption peaks in the peptide powder. Therefore, further analysis was required.

SD-IR analysis

In order to further analyze the similarities and differences between the peptide powder and the four hormones in their characteristic absorption peaks, SD-IR was used to enhance the apparent resolution of the IR spectrum. This method was chosen to expand the subtle differences between the peptide powder and the four hormone spectra and to separate the overlapping absorption peaks. By doing so, it was possible to obtain and identify the component information of overlapping and low absorption intensity peaks in the hybrid system more clearly.

Figure 2b shows the wavenumber range of 3600 2700 and 1750-900 cm-1 selected for local infrared spectral wave analysis. The SD-IR spectrum separated the spectral peak information that overlapped in the IR spectrum. For example, the absorption peak difference of -OH was amplified, and the weaker overlapping absorption peaks of -CH2 and -CH3 around 2942 and 2866 cm-1 were separated into multiple distinct spikes that were easier to identify. Although the absorption peaks in the range of 1650-1580 cm-1 were even separated and the difference was obvious, they were still covered by the absorption peaks of the amide I band and amide II band of peptide powder. The absorption peak of the benzene ring around 1498 cm-1 was also still covered by the new absorption peak of peptide powder, and the absorption peaks of -CH3, C-O, -OH, and =CH in the range of 1300 900 cm-1 separated into multiple distinct peaks. Although the absorption intensity was stronger than that in peptide powder, at lower concentrations, it was still obscured by the relevant absorption peaks.

2DCOS-IR analysis

When the concentration changed, the absorption peak intensity of hormones in peptide powder would also change, while the spectrum of peptide powder would not change. 2DCOS-IR can capture this change and generate auto-peaks and cross-peaks with different intensities, which represented the changes in spectral intensity at the corresponding variable. Different concentrations of each hormone were used as disturbance factors to observe the dynamic changes of peptide powder spectra, and the main fingerprints in wavenumber segmentation were analyzed by 2DCOS-IR (Figure 3).

Figure 3
MIR and 2DCOS-IR of peptide powder with diverse concentration of hormones: (a) dynamic IR spectra corresponding to concentration variation; (b) 2DCOS-IR in the ranges of 3600-2800 and 1750-900 cm-1; (c) 2DCOS-IR and auto-peaks in the ranges of 1750-1400, 1400-1258, 1258-1140 and 1140-900 cm-1.

2DCOS-IR was performed for the 3600-2800 and 1750 900 cm-1 based on the results obtained from the IR and SD-IR spectra (Figure 3a). The results clearly demonstrated that 2DCOS-IR of peptide powder with different hormone concentrations in 3600-2800 cm-1 exhibited significant differences (see Supplementary Information (SI) section, Figure S1, BT-1, CT-1, CS-1, YX-1). Conversely, in 1750 900 cm-1, 2DCOS-IR spectrum of peptide powder with different hormones added exhibit minimal differences (Figure S1, BT-3, CT-3, CS-3, YX-3), which was consistent with the results of the IR and SD-IR spectrum analyses.

The 1750-900 cm-1 of the 2DCOS-IR was divided into four sub-bands: 1750-1400, 1400-1258, 1258-1140, and 1140-900 cm-1, as depicted in Figure 3c. Differences between the 2DCOS-IR spectra of peptide powders added with different hormones were analyzed separately for each sub-band. At shorter wavelength, Figures S2 and S3, SI section, showed minor differences in depth resolution. With the exception of peptide powder added with estrone, the other three types of peptide powder added with hormone exhibited a weak auto-peak near 1731 cm-1, and the relative strength of this peak varied slightly (see SI section, Figure S2, BT2-1; Figure S3, CS2-1, YX2-1). At 1660, 1605, and 1551 cm-1, the absorption peaks of amide I and II bands in the peptide powder were obscured, but the auto-peak intensity of the peptide powder added with prednisone acetate and diethylstilbestrol at these absorption peaks was enhanced (see SI section, Figure S3, CS2-1, YX2-1). Six weak auto-peaks, occurring at 1325, 1312, 1272, 1248, 1218, and 1188 cm-1, were observed in the range of 1335-1170 cm-1, and the auto-peak differences in this band were mainly offset in the peak position, with no significant differences in peak intensity. An auto-peak was noted around 1050 cm-1, and this peak was most prominent in betamethasone peptide powder addition (see SI section, Figure S2, BT2-7), but not in diethylstilbestrol peptide addition (see SI section, Figure S3, YX2-7). Finally, a table of characteristic peaks for hormones was generated based on the above analysis (Table 2).

Table 2
Attribution of characteristic peaks for hormones

The IR, SD-IR and 2DCOS-IR analysis revealed that the key fingerprint features of hormones were in the regions of 3430-3250, 2950-2850, 1750-1550 and 1335 1050 cm-1. The tri-step MMIR inherent with the group-peak matching method proved that the characteristic fingerprint absorption peaks of the four hormones were responsible for the difference in the 2DCOS-IR spectrum, and it also verified that the infrared spectrum was able to be used to resolve the spectral macroscopic fingerprint of peptide powder with trace hormones. The use of this method is similar to feature selection, and the obtained data is closely related to chemical significance, effectively simplifying the dataset and increasing the weight of key information.

Quantitative models based on single dataset

According to the MMIR analysis, the MIR fingerprint regions of the four hormones (betamethasone, prednisone acetate, estrone, and diethylstilbestrol) in peptide powder were found to be in the range of 3430-3250, 2950-2850, 1750-1550 and 1335-1050 cm-1, respectively. A quantitative model was established to predict the content of the four hormones in peptide powder based on single datasets of NIR full wavelength, MIR full wavelength, and MIR fingerprint region. PLSR and BPNN were used to construct the quantitative model for different datasets. The most critical step toward improving the prediction accuracy of the model was to build low-level fusion and mid-level fusion quantitative models. In order to clearly demonstrate the construction method of the models, the model performance graph was simplified into one graph (Figure 4), and it showed the performance graph of the CS model, which has good predictive ability.

Figure 4
Prediction and actual value plots of CS prediction models based on datasets (NIR full wavelength, MIR full wavelength, MIR fingerprint region, low-level fusion and mid-level fusion).

In the evaluation of the quantitative model, a larger R and a smaller RMSE indicated better predictive power and accuracy of the quantitative model. RMESC and RMESP indicated the difference between measured values, and the closer the two values were, the closer the results of the model on the calibration and validation sets were, which means that the model performance was more stable. The predictive effectiveness of the model was proportional to the RPD, it was able to exhibit changes in prediction errors and observations, and provided a more objective indicator of model validity than RMSEP, making quantitative models easier to compare.

LASSO regularization for optimization

LASSO regularization was used for model optimization. In the latter half of equation 7, λ was used to limit the range of matrix βj, making some low weight feature coefficients zero and retaining high weight parts, which limits the complexity of the model. The goal of LASSO was to find a suitable λ that minimized the mean squared error (MSE) of the equation. In the optimization process, this study selected a series of λ values (10-5-100), and calculated the MSE of the equation. Figure 5 showed the trend of MSE variation with λ. The blue dot represents the average MSE generated by λ, and the error intervals of different models were also marked. The λ at the minimum MSE was 0.021. At the same time, this method could also filter the LVs in the PLS model (reducing some LV weights to 0). After screening, a total of 13-24 LVs were selected and validated using Halland and Thomas criterion.

Figure 5
Pathway plot of MSE corresponding to different λ.

Halland and Thomas criterion

Halland and Thomas criteria were used to optimize LV in PLS.31 This method first calculated the prediction error of the sum of squares (PRESS) for different numbers of LVs taking into consideration the correlation of prediction errors. It performed F-test on the number of LVs (h) with the minimum PRESS and the number of LVs (H) with a smaller quantity, and obtained a model with the least number of H LVs where the PRESS was not significantly greater than the model with h LVs. The specific steps were as follows: (i) compute F(H) = PRESSH / PRESSh (H = 1, 2, 3…, h); (ii) choose the smallest H as the optimal number of LV, and F(H) < Fα; m, m where Fα; m, m was the (1 - a) percentile of Snedecor’s F distribution with m and m degrees of freedom (m is the number of calibration samples). This study selected the case of α = 0.25 for F-test of the model, and various models reserved 8-17 LVs for training purposes. Tables 3 and 4 present the specific model evaluation parameters.

Table 3
Summary of parameters of the quantitative model of hormones based on single datasets
Table 4
Summary of parameters of the quantitative model of hormones based on fusion datasets

Comparison of the performance among the three datasets

The prediction accuracy of the quantitative model based on the MIR fingerprint region dataset for the four hormones was found to be slightly better than that of the quantitative model based on the NIR full wavelength band (see SI section, Figures S4 and S5), with the RPD being less than 3.321. Due to the inherent noise and broad absorption band of NIR,32 the peaks of trace substances may be masked, and this may also be the reason for poor performance of NIR dataset. Simplifying the dataset and increasing the weight of key information are effective methods for improving the model, which is also the main direction of subsequent research.

It was observed that the quantitative model based on the MIR fingerprint region dataset was better than the one based on the MIR full wavelength dataset. The key reason was that perturbation of substance concentrations cause the generation of auto-peaks and cross-peaks in 2DCOS IR, so the MIR fingerprint region where auto-peaks and cross-peaks are generated is highly correlated with the concentration of the target substance in the sample. The data volume of the model decreased by 77.5%. The model whose dataset is MIR fingerprint region has less data, higher correlation and accuracy compared to the MIR full spectra. The infrared characteristic fingerprint region was effective in retaining key useful information in the spectrum, while removing redundant information and noise.33

Comparison of the performance between BPNN and PLS

The quantitative models based on PLSR and BPNN showed slight differences in the prediction of the four hormones, with the model based on BPNN performing slightly better than the one based on PLSR. Finally, in three single datasets, it was found that the quantitative model based on the MIR fingerprint region dataset combined with BPNN had the highest prediction accuracy and stability for the four hormones.

The model had the highest prediction accuracy for estrone, with Rp in the calibration set being 0.968, RMSEP being 7.831 mg kg-1, and RPD being greater than 3.0. On the other hand, the model had the lowest prediction accuracy for prednisone acetate, with Rp on the calibration set being 0.965, RMSEP being 8.323 mg kg-1, and RPD being less than 3.0. The quantitative prediction accuracy of betamethasone and diethylstilbestrol were found to be between them. To summarize, if different datasets were used, the quantitative model established by the MIR fingerprint region dataset was found to be better than the full-band dataset. If different algorithms were used, the prediction effect of the quantitative model based on BPNN was better than that of PLSR.

Low-level fusion analysis

Two modeling algorithms (PLSR and BPNN) were utilized to construct low-level fusion and mid-level fusion quantitative models, respectively, to predict the concentration of hormones (betamethasone, prednisone acetate, estrone and ethylene estradiol) in peptide powders. The evaluation parameters of the models can be found be found in Table 4.

Low-level data fusion associated two heterogeneous data types (NIR and MIR), providing more dimensions and complete information. Low-level data fusion significantly improved the stability and predictive accuracy of the quantitative model, yielding similar predictive accuracy for the four hormones. Meanwhile, the predictive power of the quantitative models based on both PLSR and BPNN only showed minor differences for the same hormone, possibly because the BPNN did not exhibit an absolute advantage in predicting low-content components (see SI section, Figures S6 and S7). All the quantitative models built based on low-level data fusion had Rp greater than 0.965, RMSEP less than 8.650 mg kg-1, RMSEC less than 8.127 mg kg-1 and RPD greater than 3.437. RMSE has significantly decreased on the basis of single datasets, and the gap between RMSEC and RMSEP has significantly narrowed. This indicated that the accuracy of the model has been improved, while the stability of the model has increased. In summary, the quantitative models based on low-level data fusion have all been substantially improved in terms of prediction accuracy for hormone content and were able to be used to detect hormones in actual peptide powders. Despite significantly improving the predictive accuracy of the model, the low-level fusion quantitative model still fell short of the desired goal and still had some errors in predicting hormones. The reason was that the low-level fusion dataset contained too much irrelevant spectral data, such as proteins, sugars, and noise, which inevitably interfered with the training of the model. LVs were mapped from the original data, which meant that LVs may contain some extra data. When the key parameters were not accurate enough, it was inevitable that the predictive ability of the model would weaken. At the same time, these miscellaneous data also extended the training time and reduced accuracy and robustness of the model.

Mid-level fusion analysis

The quantitative model was established by mid-level data fusion, which greatly improved the prediction accuracy of the quantitative model. The quantitative model established by BPNN had better performance, with Rp above 0.99, RMSEP below 3.949 mg kg-1, RMSEC below 3.872 mg kg-1 and RPD above 6.478 for all four hormones. RMSE were the lowest among all models, and the gap between the two was also the smallest. In addition, LOD was calculated by multivariate calibration. LOD of mid-level fusion based on BPNN were 6.014 6.142 mg kg-1 (BT), 4.197 4.221 mg kg-1 (CS), 5.109-5.302 mg kg-1 (CT) and 4.411-4.498 mg kg-1 (YX), which are the lowest group among several models. However, it was observed that the quantitative model based on PLSR showed a small advantage over the BPNN-based model in predicting prednisone acetate, but this slight advantage did not play a significant role in the prediction error.

In some recent studies,34,35 the common approach was to use NIR spectral data as samples for the model to predict the essential components of the samples, and the model evaluation results in this study showed that the model using data fusion for prediction was more accurate and robust than the NIR full wavelength. The optimization of RMSEP was visualized through heatmap (Figure 6), and the use of data fusion significantly improved the predictive ability of the model. Compared to the NIR full wavelength dataset, the RMSEP of the quantitative models for betamethasone, prednisone acetate, estrone and vinblastine diethylstilbestrol were reduced by 82.03, 90.61, 88.46 and 92.82%, respectively.

Figure 6
RMSEP heatmap of different models (MFR: MIR fingerprint region; LOW: low-level fusion; MID: mid-level fusion).

This demonstrated that data fusion significantly enhanced the prediction accuracy of the models by combining data information from two sources, namely NIR and MIR fingerprint region. Mid-level fusion outperformed low-level fusion because it incorporated valuable information for model training after feature extraction by PCA, while low-level fusion simply merged spectral data without any feature extraction or data processing to eliminate unnecessary information. The change in the number of LVs could also reflect this well. The LVs in mid-level fusion had decreased, and it mapped the data after feature filtering. In conclusion, the quantitative BPNN model based on mid-level fusion exhibited the most accurate prediction results, making it applicable for practical detection of hormones in peptide powders.

Based on the model obtained in the previous section, two tailed t-test method was used to verify the significance of the mid-level fusion model results. Four hormones of 5, 10, and 20 mg kg-1 were prepared, and three sets of data were detected by HPLC and the average value was calculated. The prediction group has set 5 parallels (see SI section, Table S1). According to f = n - 1 = 4, α = 0.05 and t (1 - α/2; f = n - 1) = t (0.975; 5) = 2.776, the absolute values of fusion models based on BPNN were all less than 2.776. This indicated that there was no significant difference between the predicted and actual values, and the results of the prediction of the model were reliable. However, there was a significant difference in the fusion model based on PLSR at a concentration of 5 mg kg-1.

Conclusions

The adulterated hormones were generally in trace amount, their characteristic peaks were mostly obscured by the absorption peaks of the peptide powder. Nevertheless, MMIR (multi-molecular infrared spectroscopic) mathematically separated and extracted the main characteristic fingerprint peaks of the four hormones in the peptide powder successfully. Although the fingerprint characteristics of different hormones varied, they were predominantly located at 3430-3250, 2950-2850, 1750-1550 and 1335-1050 cm-1. As a result, a spectral identification mechanism for hormones in the peptide powders based on their characteristic fingerprints has been revealed.

The MIR (mid-infrared spectra) fingerprint region effectively removed irrelevant redundant and noisy information while retaining useful information on relevant substance components, thereby improving the accuracy and stability of the quantitative model for predicting hormones in peptide powders. Data fusion techniques further improved the accuracy of quantitative models based on a single dataset, with mid-level data fusion showing even greater advantages. In the case of predicting the content of betamethasone, prednisone acetate, estrone, and vinblastine in peptide powders, the quantitative models constructed using mid-level data fusion had lower RMSEPs (root mean square error of prediction) of 82.03, 90.61, 88.46 and 92.82%, respectively, compared to the quantitative models constructed from the NIR (near-infrared spectra) full wavelength dataset. Regarding the two different modelling algorithms used, PLSR (partial least squares regression) and BPNN (back propagation neural network) the quantitative model constructed using the BPNN showed better prediction accuracy than the one constructed using the PLSR. Thus, the best model for quantifying hormones in peptide powders was a regression model based on mid-level data fusion combined with the BPNN.

The above quantitative model confirms that data fusion integrates the rich information obtained from multivariate instrumentation techniques, enabling a multidimensional, more comprehensive and accurate evaluation of food quality and enhancing the confidence level of the information. At the same time, MIR can effectively extract key information from the infrared spectrum, and the selection of the characteristic fingerprint region as the input signal for the model can greatly reduce the computational effort. A quantitative model for the quantification of hormones in peptide powders based on data fusion technology provides a powerful technical support for the rapid quality evaluation of commercially available peptide powder products. To achieve a comprehensive and complementary detection of complex samples in future, multiplex instrumentation techniques combined with integrated analytical models would be a key candidate approach.

Supplementary Information

Supplementary data (2DCOS-IR and prediction and actual value plots of prediction model) are available free of charge at http://jbcs.sbq.org.br as PDF file.

Acknowledgments

This work is financially supported by the National Natural Science Foundation of China (32371321) and the 2022 Shanghai Grain and Material Reserve Science and Technology Innovation Research Project (2022-3).

References

  • 1 Neves, M. G.; Poppi, R. J.; Breitkreitz, M. C.; Food Control 2022, 132, 108489. [Crossref]
    » Crossref
  • 2 Sato, K.; J. Agric. Food Chem. 2018, 66, 3082. [Crossref]
    » Crossref
  • 3 Chi, C. F.; Hu, F. Y.; Wang, B.; Li, T.; Ding, G. F.; J. Funct. Foods 2015, 15, 301. [Crossref]
    » Crossref
  • 4 Lee, S. Y.; Hur, S. J.; Food Chem. 2017, 228, 506. [Crossref]
    » Crossref
  • 5 Chalamaiah, M.; Yu, W. L.; Wu, J. P.; Food Chem. 2018, 245, 205. [Crossref]
    » Crossref
  • 6 Lin, Y. H.; Tsai, J. S.; Chen, G. W.; J. Food Biochem. 2017, 41, 8. [Crossref]
    » Crossref
  • 7 Lima, C. A.; Campos, J. F.; Lima Filho, J. L.; Converti, A.; da Cunha, M. G. C.; Porto, A. L. F.; J. Food Sci. Technol. 2015, 52, 4459. [Crossref]
    » Crossref
  • 8 Chalamaiah, M.; Keskin Ulug, S.; Hong, H.; Wu, J.; J. Funct. Foods 2019, 58, 123. [Crossref]
    » Crossref
  • 9 Guan, W. B.; You, Y. X.; Li, J. L.; Hong, J. Y.; Wu, H. Y.; Rao, Y. N.; J. Sci. Food Agric. 2021, 101, 2279. [Crossref]
    » Crossref
  • 10 Mitema, A.; Feto, N. A.; Rafudeen, M. S.; Food Addit. Contam.: Part A 2020, 37, 2149. [Crossref]
    » Crossref
  • 11 Su, W. H.; Sun, D. W.; Compr. Rev. Food Sci. Food Saf. 2018, 17, 104. [Crossref]
    » Crossref
  • 12 Xie, J.; Pan, Q. N.; Li, F. L.; Tang, Y. Y.; Hou, S. W.; Xu, C. H.; Talanta 2021, 222, 121325. [Crossref]
    » Crossref
  • 13 Lin, X. W.; Li, F. L.; Wang, S.; Xie, J.; Pan, Q. N.; Wang, P.; Xu, C. H.; Food Bioprocess Technol. 2023, 16, 667. [Crossref]
    » Crossref
  • 14 Li, F. L.; Xie, J.; Wang, S.; Wang, Y.; Xu, C. H.; Talanta 2021, 234, 122653. [Crossref]
    » Crossref
  • 15 Liu, S. Q.; Wei, W.; Bai, Z. Y.; Wang, X. C.; Li, X. H.; Wang, C. X.; Liu, X.; Liu, Y.; Xu, C. H.; Spectrochim. Acta, Part A 2018, 189, 265. [Crossref]
    » Crossref
  • 16 Medina, S.; Perestrelo, R.; Silva, P.; Pereira, J. A. M.; Camara, J. S.; Trends Food Sci. Technol. 2019, 85, 163. [Crossref]
    » Crossref
  • 17 Altomare, C.; Logrieco, A. F.; Gallo, A.; Encyclopedia Mycol. 2021, 1, 64. [Crossref]
    » Crossref
  • 18 Kong, L. B.; Peng, X.; Chen, Y.; Wang, P.; Xu, M.; Int. J. Extreme Manuf. 2020, 2, 27. [Crossref]
    » Crossref
  • 19 Khaleghi, B.; Khamis, A.; Karray, F. O.; Razavi, S. N.; Inf. Fusion 2013, 14, 28. [Crossref]
    » Crossref
  • 20 Wang, S.; Hu, X. Z.; Liu, Y. Y.; Tao, N. P.; Lu, Y.; Wang, X. C.; Lam, W.; Lin, L.; Xu, C. H.; Food Chem. 2022, 372, 8. [Crossref]
    » Crossref
  • 21 Zhao, Q.; Yu Y.; Hao, N.; Miao, P.; Li, X.; Liu, C.; Li, Z.; Microchem. J. 2023, 190, 108670. [Crossref]
    » Crossref
  • 22 Assis, C.; Pereira, H. V.; Amador, V. S.; Augusti, R.; de Oliveira, L. S.; Sena, M. M.; Food Chem. 2019, 281, 71. [Crossref]
    » Crossref
  • 23 Matlab, version R2019b; The MathWorks Inc., Natick, MA, USA, 2007.
  • 24 Noda, I.; J. Mol. Struct. 2014, 1069, 3. [Crossref]
    » Crossref
  • 25 Liu, L. Y.; Yang, R. J.; Zhang, J.; Gong, G. M.; Yang, Y. R.; J. Mol. Struct. 2020, 1214, 7. [Crossref]
    » Crossref
  • 26 Vongsvivut, J.; Heraud, P.; Zhang, W.; Kralovec, J. A.; McNaughton, D.; Barrow, C. J.; Food Bioprocess Technol. 2014, 7, 265. [Crossref]
    » Crossref
  • 27 Zheng, H.; Jiang, L. L.; Lou, H. Q.; Hu, Y.; Kong, X. C.; Lu, H. F.; J. Agric. Food Chem. 2011, 59, 592. [Crossref]
    » Crossref
  • 28 Liu, Y. D.; Sun, X. D.; Ouyang, A. G.; LWT--Food Sci. Technol. 2010, 43, 602. [Crossref]
    » Crossref
  • 29 Tamaki, Y.; Mazza, G.; J. Agric. Food Chem. 2011, 59, 504. [Crossref]
    » Crossref
  • 30 Allegrini, F.; Olivieri, A. C.; Comprehensive Chemometrics, 2nd ed.; Elsevier: Oxford, UK, 2020.
  • 31 Haaland, D. M.; Thomas, E. V.; Anal. Chem. 1988, 60, 1193. [Crossref]
    » Crossref
  • 32 Pasquini, C.; Anal. Chim. Acta 2018, 1026, 8. [Crossref]
    » Crossref
  • 33 Lin, X. W.; Liu, R. H.; Wang, S.; Yang, J. W.; Tao, N. P.; Wang, X. C.; Zhou, Q.; Xu, C. H.; J. Agric. Food Chem. 2023, 71, 10819. [Crossref]
    » Crossref
  • 34 Niemi, C.; Mortensen, A. M.; Rautenberger, R.; Matsson, S.; Gorzsas, A.; Gentili, F. G.; Food Chem. 2023, 404, 134700. [Crossref]
    » Crossref
  • 35 Zhou, L.; Wang, X. F.; Zhang, C.; Zhao, N.; Taha, M. F.; He, Y.; Qiu, Z. J.; Food Bioprocess Technol. 2022, 15, 2354. [Crossref]
    » Crossref

Edited by

  • Editor handled this article:
    Ivo M. Raimundo Jr. (Associate)

Publication Dates

  • Publication in this collection
    09 Dec 2024
  • Date of issue
    2025

History

  • Received
    15 July 2024
  • Accepted
    29 Oct 2024
location_on
Sociedade Brasileira de Química Instituto de Química - UNICAMP, Caixa Postal 6154, 13083-970 Campinas SP - Brazil, Tel./FAX.: +55 19 3521-3151 - São Paulo - SP - Brazil
E-mail: office@jbcs.sbq.org.br
rss_feed Acompanhe os números deste periódico no seu leitor de RSS
Acessibilidade / Reportar erro