Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices
Abstract
:1. Introduction
- Detailed analysis of five ML algorithms (logistic regression, support vector machine, decision tree, random forest, and artificial neural network) for determination of anomaly detection performance on traffic traces between different IoT nodes communicating over DS2OS common middle-ware.
- Proposal of two general and intuitive approaches for keeping comparable classification results and reducing the size of an imbalanced training dataset by randomly under-sampling the majority class (‘NL’), and by under-sampling each class with clustering and selecting the most representative observation samples.
- Evaluation of ML algorithms training times on Raspberry Pi 4 comparing small randomly specified imbalanced datasets and new reduced balanced datasets, as well as examining the results of memory usage for suitable implementation on resource-constrained edge devices.
2. Related Work
2.1. Edge Computing
2.2. Machine Learning
2.2.1. ML Algorithms
- Logistic regression (LR) is a linear model for classification [26]. In the scikit-learn implementation used, regularization is applied by default in Python as a function call LogisticRegression(class_weight = ’balanced’, max_iter = 10,000, n_jobs = −1). The solver for the optimization problem is lbfgs [27]. By default, it uses the cross-entropy loss in a multiclass case. The parameter class_weight was set to balanced mode, which uses values of output to automatically adjust weights inversely proportional to class frequencies in the input data. The parameter max sets the maximum number of iterations for the solvers to converge and was raised from the default value of 100 to 10,000 to prevent solvers from not converging. The parameter n_jobs sets the number of CPU cores that can be used in case of a multiclass problem with a one-vs.-rest (OvR) scheme and was set to −1 for all runs (−1 means using all available processors), although it had no effect in this case as cross-entropy loss was used for the multiclass problem.
- Support vector machine (SVM) is a supervised learning model used for classification and regression [28]. Scikit-learn’s C-Support Vector Classification implementation is based on libsvm defined as function call SVC(class_weight = ‘balanced’). By default, it uses a Radial-Basis-Function kernel and l2 regularization with the strength of 1.0 [29]. The multiclass support is handled according to a one-vs.-one scheme. The parameter class_weight was set to balanced mode, which uses values of y to automatically adjust weights inversely proportional to class frequencies in the input data.
- Decision tree (DT) is a non-parametric supervised learning method for classification [30]. In the scikit-learn implementation used, it is defined as DecisionTreeClassifier() [31], with default criterion for measuring the quality of a split using Gini impurity. This is a measure of how often a randomly chosen element from the set would be incorrectly labeled. No parameters were set outside of their default values.
- Random forest (RF) is one of the ensemble methods which combines the predictions of several base estimators to improve the robustness of the estimator [32]. Each tree in the ensemble is built from a sample drawn with a replacement from the training set. By default, in the function call RandomForestClassifier(n_estmators = 100, n_jobs = −1), there are 100 trees in the scikit-learn implementation of the algorithm, with Gini impurity as a default measure of split’s quality. The whole dataset is used to build each tree. The parameter n_jobs was set to −1 to use all available CPU cores for parallelizing fit and predicted methods over the trees.
- Artificial neural network (ANN) is a circuit of connected neurons that each deliver outputs based on their inputs and used predefined activation functions [33]. A Keras library with Tensorflow backend was used for the ANN training model with 11 input nodes on the input layer, 32 nodes on a hidden layer with relu (rectified linear) activation function, and 8 output nodes with softmax activation function to normalize the outputs. The selected optimization function was the Adam optimizer. The loss function was sparse categorical cross entropy and the number of epochs was set to ten.
2.2.2. Evaluation Metrics
- Accuracy determines how many predictions the classifier got right from all the predictions (Equation (1)). It is defined as a sum of number of true positives (TP) and true negatives (TN) divided with the sum of number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN):While the higher the number the better in case of an approximately equal number of samples in all classes, accuracy alone often leads to an error in the classification of the minor class in imbalanced datasets;
- Precision is the fraction of relevant instances among the retrieved instances (Equation (2)). It is defined as a number of true positive (TP) results divided by the number of true positive (TP) results and false positive (FP) results;
- Recall is the fraction of the total amount of relevant instances that were actually retrieved (Equation (3)). It is defined as a number of true positive (TP) results divided by true positive (TP) results and false negative (FN) results;
- F1 score is the harmonic mean of precision and recall (Equation (4)). The highest possible value of F1 is 1, indicating perfect precision and recall, and the lowest possible value is 0, if either the precision or the recall is zero;
- Confusion Matrix is a specific table layout meant to visualize the performance of an algorithm, typically one from a group of supervised learning algorithms. In Python implementation, each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class (Figure 1). It is easy to see all falsely classified samples. The more samples found on the diagonal of the matrix, the better the model is.
3. Results
- Imbalanced training datasets (Di)—randomly selected samples from the training set;
- Balanced datasets (DRi)—all anomalous classes and randomly selected samples from class ‘NL’;
- Balanced datasets (DCi)—selected clusters of representative samples from all classes.
3.1. Dataset
- Removal of corrupted data, unreadable field values;
- Change of NaN values from column ‘NodeType’ to Malicious;
- Replacement of all non-numeric values in column ‘value’ with numeric representations, all missing values in the same column filled with 0;
- Removal of ‘timestamp’ column from the dataset, as it is irrelevant;
- Use of label encoding on all columns except on column ‘values’.
3.1.1. Imbalanced Subsets
3.1.2. Random Selection of Class ‘NL’
3.1.3. Subsets of Clusters Data
Algorithm 1: Dataset reduction using clustering. |
Input:Dold, t, n |
Output:Dnew |
Function DatasetReduction(Dold, t, n): Dnew ← [ ] |
for c in findClasses(Dold) do |
Xc ← extractClassPoints(Dold,c) |
db ← DBSCAN.fit(Xc) |
for l in db.clusters() do |
m ← size(l)/size(Xc) |
if m > t do |
Xp ← db.extractClusterPoints(p) |
q ← createCentroid(Xp) |
dist ← distances(q, Xp) Dnew ← addClosestNPoints(Dnew, dist, Xp, n) |
end |
end |
end |
end |
3.2. Evaluation of Imbalanced Training Datasets
3.2.1. Classification Results
3.2.2. Confusion Matrix (D20)
3.3. Evaluation of Balanced Training Datasets with Reduced Class ‘NL’
3.3.1. Classification Results
3.3.2. Confusion Matrix (DR5)
3.4. Evaluation of Balanced Datasets Determined with Clustering
3.4.1. Classification Results
3.4.2. Confusion Matrix (DC5)
3.5. Comparison of ML Algorithms
3.6. Edge Computing Results on Raspberry Pi 4
3.6.1. Training Time
3.6.2. Memory Usage
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yousefpour, A.; Fung, C.; Nguyen, T.; Kadiyala, K.; Jalali, F.; Niakanlahiji, A.; Kong, J.; Jue, J.P. All one needs to know about fog computing and related edge computing paradigms: A complete survey. J. Syst. Archit. 2019, 98, 289–330. [Google Scholar] [CrossRef]
- Merenda, M.; Porcaro, C.; Iero, D. Edge Machine Learning for AI-Enabled IoT Devices: A Review. Sensors 2020, 20, 2533. [Google Scholar] [CrossRef] [PubMed]
- Premsankar, G.; Francesco, M.D.; Talb, T. Edge Computing for the Internet of Things. IEEE Internet Things J. 2018, 5, 1275–1284. [Google Scholar] [CrossRef] [Green Version]
- Chen, J.; Ran, X. Deep Learning With Edge Computing: A Review. Proc. IEEE 2019, 107, 1655–1674. [Google Scholar] [CrossRef]
- Kozik, R.; Choras, M.; Ficco, M.; Palmieri, F. A scalable distributed machine learning approach for attack detection in edge computing environments. J. Parallel Distrib. Comput. 2018, 119, 18–26. [Google Scholar] [CrossRef]
- Poornima, I.G.A.; Paramasivan, B. Anomaly detection in wireless sensor network using machine learning algorithm. Comput. Commun. 2020, 151, 331–337. [Google Scholar] [CrossRef]
- Hasan, M.; Islam, M.M.; Zarif, M.I.I.; Hashem, M.M.A. Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches. Internet Things 2019, 7, 100059. [Google Scholar] [CrossRef]
- Elsayed, M.S.; Le-Khac, N.A.; Dev, S.; Jurcut, A.D. Network Anomaly Detection Using LSTM Based Auto-encoder. In Proceedings of the 16th ACM Symposium on QoS and Security for Wireless and Mobile Networks, Alicante, Spain, 16–20 November 2020. [Google Scholar]
- Pang, G.; Shen, C.; Cao, L.; Hengel, A. Deep Learning for Anomaly Detection: A Review. ACM Comput. Surv. 2021, 54, 1–38. [Google Scholar] [CrossRef]
- Churcher, A.; Ullah, R.; Ahmad, J.; Rehman, S.; Masood, F.; Gogate, M.; Alqahtani, F.; Nour, B.; Buchanan, W.J. An Experimental Analysis of Attack Classification Using Machine Learning in IoT Networks. Sensors 2021, 21, 446. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.M.; Cho, W.C.; Kim, D. Anomaly Detection of Environmental Sensor Data. In Proceedings of the 2020 International Conference on Information and Communication Technology Conference (ICTC), Jeju, Korea, 21–23 October 2020. [Google Scholar]
- Janjua, Z.H.; Vecchio, M.; Antonini, M.; Antonelli, F. IRESE: An intelligent rare-event detection system using unsupervised learning on the IoT edge. Eng. Appl. Artif. Intel. 2019, 84, 41–50. [Google Scholar] [CrossRef] [Green Version]
- Sajjad, M.; Nasir, M.; Muhammad, K.; Khan, S.; Jan, Z.; Sangaiah, A.K.; Elhoseny, M.; Baik, S.W. Raspberry Pi assisted face recognition framework for enhanced law-enforcement services in smart cities. Future Gener. Comput. Syst. 2017, 108, 995–1007. [Google Scholar] [CrossRef]
- Anandhalli, M.; Baligar, V.P. A novel approach in real-time vehicle detection and tracking using Raspberry Pi. Alex. Eng. J. 2017, 57, 1597–1607. [Google Scholar] [CrossRef]
- Xu, R.; Nikouei, S.Y.; Chen, Y.; Polunchenko, A.; Song, S.; Deng, C.; Faughan, T.R. Real-Time Human Objects Tracking for Smart Surveillance at the Edge. In Proceedings of the IEEE International Conference on Communications (ICC), Kansas City, MO, USA, 20–24 May 2018; pp. 20–24. [Google Scholar]
- Komninos, A.; Simou, I.; Gkorgkolis, N.; Garofalakis, J. Performance of Raspberry Pi microclusters for Edge Machine Learning in Tourism. In Proceedings of the Poster and Workshop Sessions of AmI-2019, the 2019 European Conference on Ambient Intelligence, Rome, Italy, 4 November 2019. [Google Scholar]
- Kamaraj, K.; Dezfouli, B.; Liu, Y. Edge mining on IoT Devices using Anomaly Detection. In Proceedings of the APSIPA Annual Summit and Conference 2019, Lanzhou, China, 18–21 November 2019. [Google Scholar]
- Verma, A.; Goyal, A.; Kumara, S.; Kurfess, T. Edge-cloud computing performance benchmarking for IoT based machinery vibration monitoring. Manuf. Lett. 2021, 27, 39–41. [Google Scholar] [CrossRef]
- Marquez-Sanchez, S.; Campero-Jurado, I.; Robles-Camarillo, D.; Rodriguez, S.; Corchado-Rodriguez, J.M. BeSafe B2.0 Smart Multisensory Platform for Safety in Workplaces. Sensors 2021, 21, 3371. [Google Scholar]
- Liu, C.; Su, X.; Li, C. Edge Computing for Data Anomaly Detection of Multi-Sensors in Underground Mining. Electronics 2021, 10, 302. [Google Scholar] [CrossRef]
- Patel, K.K.; Patel, S.M. Internet of things-IOT: Definition, characteristics, architecture, enabling technologies, application & future challenges. Int. J. Eng. Comput. Sci. 2016, 6, 6122–6131. [Google Scholar]
- Zantalis, F.; Koulouras, G.; Karabetsos, S.; Kandris, D. A Review of Machine Learning and IoT in Smart Transportation. Future Internet 2019, 11, 94. [Google Scholar] [CrossRef] [Green Version]
- Serkani, E.; Gharaee, H.; Mohammadzadeh, N. Anomaly Detection Using SVMs as Classifier and Decision Tree for Optimizing Feature Vectors. Int. J. Inf. Secur. 2019, 11, 159–171. [Google Scholar]
- Ergen, T.; Kozat, S.S. A Novel Distributed anomaly detection Algorithm Based on Support Vector Machines. Digit. Signal Process. 2020, 99, 102657. [Google Scholar] [CrossRef]
- Keras. Available online: https://keras.io/ (accessed on 20 November 2020).
- Linear Models (Logistic Regression). Available online: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression (accessed on 20 November 2020).
- Logistic Regression. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression (accessed on 20 November 2020).
- Support Vector Machines. Available online: https://scikit-learn.org/stable/modules/svm.html#svm (accessed on 20 November 2020).
- SVM-libsvm. Available online: https://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf (accessed on 20 November 2020).
- Decision Trees. Available online: https://scikit-learn.org/stable/modules/tree.html (accessed on 20 November 2020).
- DT Function. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html (accessed on 20 November 2020).
- Forests of Randomized Trees. Available online: https://scikit-learn.org/stable/modules/ensemble.html#forest (accessed on 20 November 2020).
- Neural Network Models. Available online: https://scikit-learn.org/stable/modules/neural_networks_supervised.html (accessed on 20 November 2020).
- Metrics and Scoring: Quantifying the Quality of Predictions. Available online: https://scikit-learn.org/stable/modules/model_evaluation.html (accessed on 20 November 2020).
- DS2OS Traffic Traces. Available online: https://www.kaggle.com/francoisxa/ds2ostraffictraces (accessed on 20 November 2020).
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
- Jang, J.; Jiang, H. DBSCAN++: Towards fast and scalable density clustering. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 3019–3029. [Google Scholar]
- Raspberry Pi 4. Available online: https://www.raspberrypi.org/products/raspberry-pi-4-model-b/specifications (accessed on 15 January 2021).
- Resource. Available online: https://docs.python.org/3/library/resource.html (accessed on 18 June 2021).
Dataset | Anomalous Data | Normal Data | Total |
---|---|---|---|
Original dataset (DS2OS) | 10,017 | 278,264 | 357,941 |
Training dataset (80%) | 8088 | 278,264 | 286,352 |
Test dataset (20%) | 1929 | 69,660 | 71,589 |
D1 (1%) | 102 | 2761 | 2863 |
D2 (2%) | 179 | 5548 | 5727 |
D5 (5%) | 410 | 13,908 | 14,318 |
D10 (10%) | 842 | 27,793 | 28,635 |
D15 (15%) | 1220 | 41,732 | 42,952 |
D20 (20%) | 1612 | 55,658 | 57,270 |
D40 (40%) | 3224 | 111,316 | 114,540 |
D60 (60%) | 4831 | 166,980 | 171,811 |
D80 (80%) | 6456 | 222,625 | 229,081 |
D100 (Training dataset) | 8088 | 278,264 | 286,352 |
Dataset | Anomalous Data | Normal Data | Total |
---|---|---|---|
DR01 (0.1%) | 8088 | 278 | 8366 |
DR02 (0.2%) | 8088 | 557 | 8645 |
DR05 (0.5%) | 8088 | 1391 | 9479 |
DR1 (1%) | 8088 | 2783 | 10,871 |
DR2 (2%) | 8088 | 5565 | 13,653 |
DR5 (5%) | 8088 | 13,913 | 22,001 |
DR10 (10%) | 8088 | 27,826 | 35,914 |
DR15 (15%) | 8088 | 41,740 | 49,828 |
DR20 (20%) | 8088 | 55,653 | 63,741 |
Dataset | Anomalous Data | Normal Data | Total |
---|---|---|---|
DC01 | 1770 | 256 | 2026 |
DC02 | 3061 | 535 | 3596 |
DC05 | 4834 | 1395 | 6229 |
DC1 | 6266 | 2815 | 9081 |
DC2 | 8088 | 5656 | 13,744 |
DC5 | 8088 | 14,177 | 22,265 |
DC10 | 8088 | 28,379 | 36,467 |
DC15 | 8088 | 42,580 | 50,668 |
DC20 | 8088 | 56,784 | 64,872 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huč, A.; Šalej, J.; Trebar, M. Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices. Sensors 2021, 21, 4946. https://doi.org/10.3390/s21144946
Huč A, Šalej J, Trebar M. Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices. Sensors. 2021; 21(14):4946. https://doi.org/10.3390/s21144946
Chicago/Turabian StyleHuč, Aleks, Jakob Šalej, and Mira Trebar. 2021. "Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices" Sensors 21, no. 14: 4946. https://doi.org/10.3390/s21144946
APA StyleHuč, A., Šalej, J., & Trebar, M. (2021). Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices. Sensors, 21(14), 4946. https://doi.org/10.3390/s21144946