On Adversarial Training with Incorrect Labels

Zhao, Benjamin Zi Hao; Lu, Junda; Zhou, Xiaowei; Vatsalan, Dinusha; Ikram, Muhammad; Kaafar, Mohamed Ali

doi:10.1007/978-981-96-0573-6_9

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15439))

Included in the following conference series:

International Conference on Web Information Systems Engineering

155 Accesses

Abstract

In this work, we study adversarial training in the presence of incorrectly labeled data. Specifically, the predictive performance of an adversarially trained Machine Learning (ML) model trained on clean data and when the labels of training data and adversarial examples contain erroneous labels. Such erroneous labels may arise organically from a flawed labeling process or maliciously akin to a poisoning attacker.

We extensively investigate the effect of incorrect labels on model accuracy and robustness with variations to 1) when incorrect labels are applied to the adversarial training process, 2) the extent of data impacted by incorrect labels (a poisoning rate), 3) the consistency of the incorrect labels either applied randomly or with a constant mapping, 4) the model architecture used for classification, and 5) an ablation study on varying training settings of pretraining, adversarial initialization, and adversarial training strength. We further observe generalization of such behaviors over multiple datasets.

An input label change to an incorrect one may occur before the model is trained in the training dataset, or during the adversarial sample curation, where annotators make mistakes labeling the sourced adversarial example. Interestingly our results indicate that this flawed adversarial training process may counter-intuitively function as data augmentation, yielding improved outcomes for the adversarial robustness of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

CHF34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: CHF 24.95; Price includes VAT (Switzerland)

eBook: CHF 70.50; Price excludes VAT (Switzerland)

Softcover Book: CHF 87.50; Price excludes VAT (Switzerland)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Nrat: towards adversarial training with inherent label noise

Article Open access 10 January 2024

Deep k-NN Defense Against Clean-Label Data Poisoning Attacks

A Model-Agnostic Framework to Correct Label-Bias in Training Data Using a Sample of Trusted Data

References

Alayrac, J.B., Uesato, J., Huang, P.S., Fawzi, A., Stanforth, R., Kohli, P.: Are labels required for improving adversarial robustness? In: ANIPS, vol. 32 (2019)
Google Scholar
Andriushchenko, M., Croce, F., Flammarion, N., Hein, M.: Square attack: a query-efficient black-box adversarial attack via random search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 484–501. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_29
Chapter Google Scholar
Carmon, Y., Raghunathan, A., Schmidt, L., Duchi, J.C., Liang, P.S.: Unlabeled data improves adversarial robustness. In: ANIPS, vol. 32 (2019)
Google Scholar
Croce, F., et al.: RobustBench: a standardized adversarial robustness benchmark. arXiv preprint arXiv:2010.09670 (2020)
Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: International Conference on Machine Learning, pp. 2206–2216. PMLR (2020)
Google Scholar
Dalvi, N., Domingos, P., Mausam, Sanghai, S., Verma, D.: Adversarial classification. In: ACM SIGKDD KDD, pp. 99–108 (2004)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE CVPR (2009)
Google Scholar
Dong, C., Liu, L., Shang, J.: Label noise in adversarial training: a novel perspective to study robust overfitting. Adv. Neural. Inf. Process. Syst. 35, 17556–17567 (2022)
Google Scholar
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 1–35 (2016)
MathSciNet Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., Song, D.: Natural adversarial examples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15262–15271 (2021)
Google Scholar
Igl, M., Farquhar, G., Luketina, J., Boehmer, W., Whiteson, S.: Transient non-stationarity and generalisation in deep reinforcement learning. In: ICLR (2021)
Google Scholar
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: ANIPS, vol. 32 (2019)
Google Scholar
Krizhevsky, et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Le, Y., Yang, X.: Tiny ImageNet visual recognition challenge (2015)
Google Scholar
Li, L., Xie, T., Li, B.: SoK: certified robustness for deep neural networks. In: 2023 IEEE Symposium on Security and Privacy (SP). IEEE (2023)
Google Scholar
Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (2005)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1765–1773 (2017)
Google Scholar
Papernot, N., McDaniel, P., Goodfellow, I.: Transferability in machine learning: from phenomena to black-box attacks using adversarial samples (2016)
Google Scholar
Qin, C., et al.: Adversarial robustness through local linearization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Rauber, J., Zimmermann, R., Bethge, M., Brendel, W.: Foolbox native: fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX. J. Open Source Softw. 5(53), 2607 (2020)
Article Google Scholar
Rosenfeld, E., Winston, E., Ravikumar, P., Kolter, Z.: Certified robustness to label-flipping attacks via randomized smoothing. In: International Conference on Machine Learning, pp. 8230–8241. PMLR (2020)
Google Scholar
Schmarje, L., Santarossa, M., Schröder, S.M., Koch, R.: A survey on semi-, self-and unsupervised learning for image classification. IEEE Access 9, 82146–82168 (2021)
Article Google Scholar
Shafahi, A., et al.: Poison frogs! _targeted clean-label poisoning attacks on neural networks. In: ANIPS, vol. 31 (2018)
Google Scholar
Shafahi, A., et al.: Adversarial training for free! In: ANIPS, vol. 32 (2019)
Google Scholar
Song, H., Kim, M., Park, D., Shin, Y., Lee, J.G.: Learning from noisy labels with deep neural networks: a survey. IEEE TNNLS 34(11), 8135–8153 (2022)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks (2013)
Google Scholar
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR (2019)
Google Scholar
Tolpegin, V., Truex, S., Gursoy, M.E., Liu, L.: Data poisoning attacks against federated learning systems. In: ESORICS (2020)
Google Scholar
Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: Ensemble adversarial training: Attacks and defenses (2017)
Google Scholar
Wong, E., Rice, L., Kolter, J.Z.: Fast is better than free: Revisiting adversarial training. arXiv preprint arXiv:2001.03994 (2020)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)
Google Scholar
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021)
Article Google Scholar
Zhou, D., Wang, N., Han, B., Liu, T.: Modeling adversarial noise for adversarial training. In: International Conference on Machine Learning, pp. 27353–27366. PMLR (2022)
Google Scholar
Zhu, J., et al.: Understanding the interaction of adversarial training with noisy labels (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Macquarie University, Sydney, Australia
Benjamin Zi Hao Zhao, Junda Lu, Dinusha Vatsalan, Muhammad Ikram & Mohamed Ali Kaafar
Ocean University of China, Shandong, China
Xiaowei Zhou

Authors

Benjamin Zi Hao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Junda Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Dinusha Vatsalan
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Ikram
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Ali Kaafar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Ikram .

Editor information

Editors and Affiliations

Qatar University, Doha, Qatar
Mahmoud Barhamgi
Victoria University, Melbourne, VIC, Australia
Hua Wang
Tianjin University, Tianjin, China
Xin Wang

A Mapped Labels

In this appendix we provide in Figs. 10 accuracies for models trained on FashionMNIST, CIFAR100, and TinyImageNet, for Settings (S1) and (S2). These figures accompany results presented in Sect. 7.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, B.Z.H., Lu, J., Zhou, X., Vatsalan, D., Ikram, M., Kaafar, M.A. (2025). On Adversarial Training with Incorrect Labels. In: Barhamgi, M., Wang, H., Wang, X. (eds) Web Information Systems Engineering – WISE 2024. WISE 2024. Lecture Notes in Computer Science, vol 15439. Springer, Singapore. https://doi.org/10.1007/978-981-96-0573-6_9

Download citation

DOI: https://doi.org/10.1007/978-981-96-0573-6_9
Published: 27 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0572-9
Online ISBN: 978-981-96-0573-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Adversarial Training with Incorrect Labels

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Nrat: towards adversarial training with inherent label noise

Deep k-NN Defense Against Clean-Label Data Poisoning Attacks

A Model-Agnostic Framework to Correct Label-Bias in Training Data Using a Sample of Trusted Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Mapped Labels

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On Adversarial Training with Incorrect Labels

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Nrat: towards adversarial training with inherent label noise

Deep k-NN Defense Against Clean-Label Data Poisoning Attacks

A Model-Agnostic Framework to Correct Label-Bias in Training Data Using a Sample of Trusted Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Mapped Labels

A Mapped Labels

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation