Abstract
Rapid recognition of the sources of drugs can provide some valuable clues and the basis for determining the nature of the case. A novel recognition method was put forward to identify the sources of heroin drugs rapidly and non-destructively by using a hand-held near infrared (NIR) spectrometer and a multi-layer-extreme learning machine (ML-ELM) algorithm. In contrast to traditional linear discriminant analysis (LDA), support vector machine (SVM) and extreme learning machine (ELM) algorithms, the accuracy, sensitivity and specificity were the highest for the proposed ML-ELM algorithm. The prediction accuracy of the ML-ELM algorithm was 25.33, 20.00, 17.33% higher than that of LDA, SVM and ELM algorithm, respectively, for 4 cases. The ML-ELM models for recognizing the different sources of heroin drugs had the best generalization ability and prediction results. The experimental results indicated that the combination of the hand held NIR technology and ML-ELM algorithm can recognize the different sources of heroin drugs rapidly, accurately, and non-destructively on the spot.
Keywords: hand-held near-infrared spectroscopy; multi-layer-extreme learning machine; heroin drugs; drug source
Introduction
Heroin is one of the most common drugs in China. It is damaging to the health of people and has become a worldwide social problem. Yunnan province is located in southwest China and bordered with Myanmar, Laos and Vietnam. The border of Yunnan Province is more than 4000 kilometers and is an important route for drug trafficking. In 2020, 35.4 tons of drugs were caught in Yunnan province, accounting for 47.45% of China. As two of the most common drugs in China, methamphetamine and heroin drugs account for more than 80% of the illicit drugs. Yunnan Province has been hit hard by the drugs, especially methamphetamine and heroin.1
The drug source denotes the scenes of the crime and case sources of the drugs. Drugs of different cases are all illegal production. As the raw material sources, producer production equipment, technical level and production process are all different for each illegal drug production factory, the purity, impurities, content of effective ingredients and residual solvent type will be different. It provides the basis for using the modern analytical techniques to recognize the source of drugs. The rapid determination of the source of the drugs can help the police to estimate if the drugs are the same case. It can provide some valuable clues and the basis for determining the nature of the case. Therefore, how to determine the sources of the drugs rapidly is very important for the drug control. Usually, the conventional method applicable to determine the sources of heroin drugs is generally based on gas chromatography-mass spectroscopy (GC-MS), which is expensive, time-consuming and cannot be synchronized to the crime scene. In order to carry out drug control efforts more effectively and investigate illegal heroin drugs after their confiscation, a fast and reliable qualitative determination for the analysis is crucial.2 As a result, it is of great importance to develop a new method, which is rapid, cheap, reliable and high-efficient.
Near-infrared (NIR) spectroscopy is a useful analytical chemistry tool and it has the advantages such as being accurate, cheap, fast, and non-destructive.3 In recent years, considerable effort has been invested in applying NIR for drug determination and identification. However, few studies have been reported on the recognition of different sources of heroin drugs using NIR technology so far. Besides, in the previous research,3 it can be known that NIR spectral data of drugs involves a large number of correlated features and it is difficult to find the connection between spectral data and heroin drug sources. Hence engineering the features to represent the salient structure of the spectral data is important.
Extreme learning machine (ELM) put forward by Huang et al.4 has been widely used in many areas, such as classification,5 regression,6 feature selection7 and so on. Multi-layer extreme learning machine (ML-ELM) is one of the unsupervised learning methods using both deep learning and extreme learning machine. Compared with the traditional ELM algorithm and the other machine learning algorithms, the generalization performance of ML-ELM algorithm is better.8-10 However, there are few applications in the classification of NIR spectral data. Considering the above discussion and analysis, ML-ELM is very suitable for the processing of NIR spectral data.11
In this study, a novel classification method using hand-held NIR technology and ML-ELM algorithm was put forward to recognize the sources of heroin drugs rapidly and non-destructively. The proposed technique was applied to establish the heroin source recognition model and testified by practical-seized heroin samples.
Experimental
Equipment and samples
A MicroNIR 1700 device was used to collect the spectral data of heroin drugs. The spectral range of the device is 900-1650 nm and it is provided by Viavi Solution, Milpitas, CA, United States. The MicroNIR is equipped with a 128-pixel detector array, which records data with a nominal spectral resolution of 6.25 nm. The integration time was 10 ms and each spectrum was the average of 60 scans, resulting in a measurement time of 0.60 s. In the scan process, the device was placed directly on the heroin samples to acquire the NIR spectroscopy. Meanwhile, a reflectivity plate over 99% was also placed under the heroin samples.
In the following research, a total number of 338 seized drugs by Yunnan police from 4 different case sources were chosen. All the samples were provided by public security bureaus of Yunnan province. The differences in the samples were very small and they could not be recognized through the eyes. Pictures of the samples could not be placed here for the reasons of confidentiality. High-performance liquid chromatography (HPLC) was used for quantitative determination of heroin drugs in order to identify the sources of different cases, the purity of the heroin drug samples of 4 different cases are shown in Table 1. It can be seen in Table 1 that the differences of the average purity of 4 different cases are huge. Besides, the statements of the suspects also showed the samples were from 4 different sources and the samples in each case were from the same source. In the experimental process, the samples were divided into 3 parts. It contained calibration, validation and testing samples. The above calibration and validation samples were chosen randomly. 180 samples were chosen as the calibration set, 83 samples as the validation samples and 75 samples as the test set. The details of the dataset are shown in Table 1.
Theory of linear discriminant analysis
Linear discriminant analysis (LDA) is a well-known dimension reduction and classification method. It is used for binary classification problems. Suggest there is a set of samples which have two classes C1 and C2. The total number of the samples is n. The number of class C1 is n1 and the number of class C2 is n2. If each sample is described by q variables, the data forms a matrix X = (Xij), i = 1, …, n; j = 1, …, q. We denote by µk the mean of class Ck and by µ the mean of all the samples:
Then, the between-class scatter matrix SB and the within-class scatter matrix SW can be defined as:
LDA determines a vector ω such that ωtSBω is maximized while ωtSWω is minimized. This double objective is realized by the vector ωopt that maximizes the criterion:
It can be proved that the solution ωopt is the eigen vector associated to the sole eigen value of SW-1SB if SW-1 exists. Once ωopt is determined, LDA provides a classifier.12,13
Theory of support vector machine
The theory of support vector machine (SVM) has been extensively described in literature.14,15 Considering a binary classification problem, the objective is to predict for all the objects their belonging to a class y{-1, +1}, from m dimensional input data represented by a vector written x = (x1, x2, …, xm) and xi for the ith object of the training set. In the case of spectra, m represents the number of wavelengths. The class prediction first requires training on a data set containing the spectra corresponding to n objects or samples with known class, that is to say n{x, y} values.
Theory of extreme learning machine algorithm
ELM proposed by Huang et al.4 shows that the hidden nodes can be randomly generated. The input data is mapped to L dimensional ELM random feature space and the network output is given by equation 6:
where β = [β1, …, βL]T is the output weight matrix between the hidden nodes and the output nodes, h(x) = [g1(x), …, gL(x)] are the hidden node outputs (random hidden features) for the input x and gi(x) is the output of the i-the hidden node. Given N training samples {(xi, ti)}Ni=1, ELM is to resolve the following learning problems:
where T = [t1, …, tN]T are the target labels and H = [hT(x1), …, hT(xN)]T means the output matrix of hidden layer. The output weights β can be calculated by equation 8:
where H† is the Moore-Penrose generalized inverse of matrix H.
To have better generalization performance and to make the solution more robust, one can add a regularization term as shown in equation 9.
where C is the regularization coefficient and the values of this parameter will be assigned randomly after the appropriate hidden layer numbers are set.
Theory of multi-layer extreme learning machine algorithm
If the number of nodes Lk in kth hidden layer is equal to the number of nodes Lk-1 in the (k - 1)th hidden layer, g could be linear otherwise, g could be nonlinear piecewise, e.g., sigmoidal function.
where Hk is the output matrix of kth hidden layer. If k = 0, the input layer x can be considered as the 0th hidden layer. The output of the connections between the last hidden layer and the output node t is analytically calculated using regularized least squares.
The steps of recognizing the sources of heroin drugs by using ML-ELM algorithm were shown as follows. Firstly, the calibration and validation samples were serial treated by spectral pre-processing. Then, the parameters of ML-ELM algorithm were determined and the classification models were built. Finally, the prediction results could be obtained by using the built ML-ELM models.
Measures of classification performance
Confusion matrix cannot only be used in the two-classification discriminant analysis, but also can be used in the multi-classification discriminant analysis. The following figure presents the basic form of confusion matrix for a multi-class classification task, with the classes A1, A2, and An. In the confusion matrix, Nij represents the number of samples actually belonging to class Ai but classified as class Aj.
A number of measures of classification performance can be defined based on the confusion matrix. Some common measures are given as Figure 1.
Accuracy is the proportion of the total number of predictions that were correct:
Precision is a measure of the accuracy provided that a specific class has been predicted. It is defined by:
Specificity is the proportion of actual negatives measured that were correct:
In equation 13, TN means true negative and FP means false positive.
Sensitivity is a measure of the ability of a prediction model to select instances of a certain class from a data set, it is defined by the formula:
The traditional F-score (F1 score) is the harmonic mean of precision and sensitivity:
Results and Discussion
The NIR spectral data of all the samples were collected by a hand-held NIR spectrometer. No pre-processing operation has been done on the samples. In the scan process, the samples were measured without the packaging. All the physical evidence bag (plastic bag) were opened to avoid the influence of a polymer. The device was placed directly on the heroin samples to acquire the NIR spectroscopy. The NIR spectral data of the dataset of 4 different cases are shown in Figure 2. The peaks from 960 to 980 nm and 1400-1420 nm are the first and second overtone of O-H, respectively; peaks in the range of 1490 1600 nm are attributed to the first overtone of N-H. In the literature,5 results showed that the variables located in the ranges of 1100-1250 nm and 1350-1600 nm had a great influence on heroin.
Savitzky-Golay derivative pre-processing operation is performed on the spectral data to reduce the influence of instrument noise and improve the signal-to-noise ratio. Besides, it can also maximize the small differences in absorption bands and correct the light scattering. Table 2 showed the accuracies of calibration models using LDA, ELM and ML-ELM with different parameters of Savitzky-Golay Derivative pre-processing operation. It could be seen from Table 2 that when the parameters were first derivative, 7-point number of smoothing points and two polynomial order, the accuracies of the three models were the highest. As the result, the above parameters were chosen in the following part. The results of Figure 3 showed that in the spectral data after pre-processing, the noise were reduced and the small differences in absorption bands were maximized after the pre-processing operation.
Accuracies of calibration models using LDA, ELM and ML-ELM with different parameters of Savitzky-Golay derivative pre-processing operation
Then, successive projections algorithm (SPA)16 was used to optimize the wavelength. The optimization wavelength after using SPA were 1157, 1190, 1200, 1357, 1391, 1425 and 1570 nm. Table 3 showed the accuracies of calibration models using LDA, ELM and ML-ELM with the selected wavelength using SPA. It can be seen from Table 3 that the accuracies of LDA, SVM, ELM and ML ELM algorithms were higher comparing with Table 2. As the result, the above 7 selected wavelength were chosen in the following experiments.
Accuracies of calibration models using LDA, ELM and ML-ELM with the selected wavelength using SPA
LDA,17 SVM,18 ELM,19,20 and ML-ELM algorithms were used to identify the spectral data of different sources of heroin drugs. The details of LDA, SVM and ELM algorithms dealing with NIR spectral data were shown in literature.17-20 To achieve the fair comparison and avoid the randomness in test results, all the calibration, validation samples were chosen randomly and the three algorithms ran on the same calibration and test splits for each calculation. For ML-ELM, the number of layers was an important parameter. There exists a specific value of the number of layers, which can make the ML-ELM achieve the highest overall accuracy. If the number of layers was bigger or smaller than the specific value, the ML-ELM algorithm will not achieve the best classification performance and the overall accuracy of the ML-ELM will not be the highest. Therefore, the first work was to define the number of hidden layers of the ML-ELM algorithm in order to achieve the better performance with less parameters. Here, sigmoid was set as the activation function and the number of hidden nodes was set as 10 and 500. It can be seen from Figure 4 that the overall accuracy of the dataset increased firstly and then decreased as the number of hidden layers increases. The accuracy was the highest when the number was 3. Three hidden layers of the ML-ELM algorithm will be chosen in the following experiment.
Accuracy, precision, sensitivity and F-score were used to evaluate the performances of the calibration models, validation results and testing results of each algorithm. Meanwhile, a ten-fold cross validation was used for each experiment in order to avoid over-fitting and reflect the performance of the different predictors faithfully. The different main factors (1-10) of LDA algorithm were chosen and the final value is 8, as the calibration, validation and prediction accuracy were the highest when the main factor of LDA algorithm was 8. For SVM algorithm, the kernel function of SVM algorithm was radial basis function (RBF). The punishment coefficient was chosen as 1 by using a particle swarm optimization algorithm in order to achieve the best classification performance. For ELM algorithm, the number of hidden neurons was set randomly by the computer each time, and the transfer function was sigmoidal function.
For the sake of comparison, the performance of LDA, SVM, ELM and ML-ELM algorithms are shown in Tables 4 and 5 in the form of a confusion matrix. As shown in the Tables 4 and 5, the accuracy, precision, sensitivity, specificity and F-score of ML-ELM algorithm are the highest compared with LDA, SVM, ELM algorithms. The higher sensitivity means the higher recognition capability for a classification model. The higher F-score means the lower misdiagnosis rate for a classification model. Here, the sensitivity and F-score of the ML-ELM algorithms are higher than those of LDA, SVM and ELM algorithms. It means the recognition capability of ML-ELM algorithm is higher and the misdiagnosis rate is lower. The above results show the ML-ELM algorithm has the best performance to build the different calibration models for the different sources of heroin drugs with NIR spectral data. In Table 5, the prediction accuracies of LDA, SVM, ELM and ML ELM for 4 cases are 72.00, 77.33, 88.00 and 97.33% respectively. Therefore, the average prediction accuracy of ML-ELM algorithm is 25.33, 20.00, 17.33% higher than that of LDA, SVM and ELM, respectively. Besides, the calibration models built by the ML-ELM algorithm also has the better prediction performance than the other LDA, SVM and ELM algorithms. The reason is that ML ELM classification algorithm achieved the best number of hidden nodes by means of using the unsupervised learning. It can extract more abstract features of the NIR spectral data comparing with the other LDA, SVM and ELM algorithms.
Besides, it can be seen in Table 5 that the computing times of LDA, SVM, ELM, and ML-ELM algorithms were 1.928, 1.022, 0.447 and 0.538, respectively. Although the computational time of ML-ELM algorithm was not the fastest, considering the classification accuracy, ML-ELM algorithm was also the best option to classify the NIR spectral data of the heroin drugs from different sources.
Conclusions
A hand-held NIR spectrometer was used to identify the different sources of heroin drugs based on a ML-ELM algorithm. The experimental results showed that the accuracy, sensitivity and F-score of ML-ELM algorithm were all higher than those of LDA, SVM and ELM algorithms. It indicated that the performances of ML-ELM algorithm were better than LDA, SVM and ELM algorithms in recognition capability, misdiagnosis rate and classification capability. The combination of NIR technology and ML-ELM is a useful tool to recognize the source of heroin drugs. However, the source can not be discriminated if the purity and the substances are close for different case sources of heroin drugs. Thus, it will be a research focus in the future work.
Acknowledgments
This work is financially supported by Physical Evidence Spectral Technology Innovation Team of Yunnan Police College in Yunnan Province (202105AE160007), Key laboratory of Spectral Technology Physical Evidence of Education of Yunnan Province, Basic Research Project of Ministry of Public Security (2020GABJC41, 2019GABJC40), Yunnan Provincial Department of Science and Technology (202001AU070004, 2018FD160), Basic Research Project of Yunnan Police College (19A006) and Yunnan Provincial Key Laboratory of Forensic Science (2020zz02, 2020zz07).
References
- 1 Tang, R.; Cai, T.; Int. J. Drug Policy 2020, 78, 102732. [Crossref]
- 2 Liu, C.-M.; Yu, H.; Min, S.-G.; Wei, J.; Xin, M.; Liu, P.-P.; Forensic Sci. Int. 2018, 290, 162. [Crossref]
- 3 Chauchard, F.; Cogdill, R.; Roussel, S.; Roger, J. M.; Bellon-Maurel, V.; Chemom. Intell. Lab. Syst. 2004, 71, 141. [Crossref]
- 4 Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K.; Neurocomputing 2006, 70, 489. [Crossref]
- 5 Chen, Y. M.; Liu, S.; Yang, Y.; Qian, Z. H.; Wang, B. Y.; An, C. Y.; Liu, C. L.; Min, S. G.; Aust. J. Forensic Sci. 2019, 53, 40. [Crossref]
- 6 Kranenburg, R. F.; Verduin, J.; Weesepoel, Y.; Alewijn, M.; Heerschop, M.; Koomen, G.; Keizers, P.; Bakker, F.; Wallace, F.; van Esch, A.; Hulsbergen, A.; van Asten, A. C.; Drug Test. Anal. 2020, 12, 1404. [Crossref]
- 7 Mao, Y. C.; Xiao, D.; Cheng, J. F.; Jiang, J. H.; Ba, T. L.; Liu, S. J.; Spectrosc. Spectral Anal. 2017, 37, 89. [Link] accessed in September 2022
- 8 Yu, Q.; van Hesswijk, M.; Miche, Y.; Nian, R.; He, B.; Séverin, E.; Lendasse, A.; Neurocomputing 2014, 129, 153. [Crossref]
- 9 Benoít, F.; van Hesswijk, M.; Miche, Y.; Verleysen, M.; Lendasse, A.; Neurocomputing 2013, 102, 111. [Crossref]
- 10 Jiang, X. W.; Yan, T. H.; Zhu, J. J.; He, B.; Li, W. H.; Du, H. P.; Sun, S. S.; Cognit. Comput. 2020, 125, 979. [Crossref]
- 11 Mai, Z.; Chen, Y.; Du, L.; IEEE Commun. Lett. 2021, 25, 1549. [Crossref]
- 12 Martis, R. J.; Acharya, U. R.; Min, L. C.; Biomed. Signal Process. Control 2013, 8, 437. [Crossref]
- 13 Peng, J.; Heisterkamp, D. R.; Dai, H. K.; IEEE Trans. Neural Networks 2003, 14, 940. [Crossref]
- 14 Cristianini, N.; Shawe-Taylor, J.; An Introduction to Support Vector Machines and Other Kernel-based Learning Methods; Cambridge University Press: London, UK, 2000. [Crossref]
- 15 Kivinen, J.; Smola, A. J.; Williamson, R. C.; IEEE Trans. Signal Process. 2004, 52, 2165. [Crossref]
- 16 Hu, R.; Zhang, L.; Yu, Z.; Zhai, Z.; Zhang, R.; Infrared Phys. Technol. 2019, 102, 102999. [Crossref]
- 17 Alomari, F.; Liu, G.; Open Autom. Control Syst. J. 2014, 6, 108. [Crossref]
- 18 Devos, O.; Ruckebusch, C.; Durand, A.; Duponchel, L.; Huvenne, J. P.; Chemom. Intell. Lab. Syst. 2009, 96, 27. [Crossref]
- 19 Chen, H.; Tan, C.; Lin, Z.; Spectrochim. Acta, Part A 2019, 229, 117982. [Crossref]
- 20 Bin, J.; Zhou, J.; Fan, W.; Li, X.; Liang, Y. Z.; Xiao, Z. X.; Li, C. S.; Acta Tab. Sin. 2017, 23, 60. [Crossref]
Edited by
-
Editor handled this article: Ivo M. Raimundo Jr. (Associate)
Publication Dates
-
Publication in this collection
10 Mar 2023 -
Date of issue
Mar 2023
History
-
Received
11 Mar 2022 -
Published
13 Sept 2022