Skip to main content

On Adversarial Training with Incorrect Labels

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2024 (WISE 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15439))

Included in the following conference series:

  • 155 Accesses

Abstract

In this work, we study adversarial training in the presence of incorrectly labeled data. Specifically, the predictive performance of an adversarially trained Machine Learning (ML) model trained on clean data and when the labels of training data and adversarial examples contain erroneous labels. Such erroneous labels may arise organically from a flawed labeling process or maliciously akin to a poisoning attacker.

We extensively investigate the effect of incorrect labels on model accuracy and robustness with variations to 1) when incorrect labels are applied to the adversarial training process, 2) the extent of data impacted by incorrect labels (a poisoning rate), 3) the consistency of the incorrect labels either applied randomly or with a constant mapping, 4) the model architecture used for classification, and 5) an ablation study on varying training settings of pretraining, adversarial initialization, and adversarial training strength. We further observe generalization of such behaviors over multiple datasets.

An input label change to an incorrect one may occur before the model is trained in the training dataset, or during the adversarial sample curation, where annotators make mistakes labeling the sourced adversarial example. Interestingly our results indicate that this flawed adversarial training process may counter-intuitively function as data augmentation, yielding improved outcomes for the adversarial robustness of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
CHF34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
CHF 24.95
Price includes VAT (Switzerland)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
CHF 70.50
Price excludes VAT (Switzerland)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
CHF 87.50
Price excludes VAT (Switzerland)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alayrac, J.B., Uesato, J., Huang, P.S., Fawzi, A., Stanforth, R., Kohli, P.: Are labels required for improving adversarial robustness? In: ANIPS, vol. 32 (2019)

    Google Scholar 

  2. Andriushchenko, M., Croce, F., Flammarion, N., Hein, M.: Square attack: a query-efficient black-box adversarial attack via random search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 484–501. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_29

    Chapter  Google Scholar 

  3. Carmon, Y., Raghunathan, A., Schmidt, L., Duchi, J.C., Liang, P.S.: Unlabeled data improves adversarial robustness. In: ANIPS, vol. 32 (2019)

    Google Scholar 

  4. Croce, F., et al.: RobustBench: a standardized adversarial robustness benchmark. arXiv preprint arXiv:2010.09670 (2020)

  5. Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: International Conference on Machine Learning, pp. 2206–2216. PMLR (2020)

    Google Scholar 

  6. Dalvi, N., Domingos, P., Mausam, Sanghai, S., Verma, D.: Adversarial classification. In: ACM SIGKDD KDD, pp. 99–108 (2004)

    Google Scholar 

  7. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE CVPR (2009)

    Google Scholar 

  8. Dong, C., Liu, L., Shang, J.: Label noise in adversarial training: a novel perspective to study robust overfitting. Adv. Neural. Inf. Process. Syst. 35, 17556–17567 (2022)

    Google Scholar 

  9. Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 1–35 (2016)

    MathSciNet  Google Scholar 

  10. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  12. Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., Song, D.: Natural adversarial examples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15262–15271 (2021)

    Google Scholar 

  13. Igl, M., Farquhar, G., Luketina, J., Boehmer, W., Whiteson, S.: Transient non-stationarity and generalisation in deep reinforcement learning. In: ICLR (2021)

    Google Scholar 

  14. Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: ANIPS, vol. 32 (2019)

    Google Scholar 

  15. Krizhevsky, et al.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  16. Le, Y., Yang, X.: Tiny ImageNet visual recognition challenge (2015)

    Google Scholar 

  17. Li, L., Xie, T., Li, B.: SoK: certified robustness for deep neural networks. In: 2023 IEEE Symposium on Security and Privacy (SP). IEEE (2023)

    Google Scholar 

  18. Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (2005)

    Google Scholar 

  19. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)

  20. Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1765–1773 (2017)

    Google Scholar 

  21. Papernot, N., McDaniel, P., Goodfellow, I.: Transferability in machine learning: from phenomena to black-box attacks using adversarial samples (2016)

    Google Scholar 

  22. Qin, C., et al.: Adversarial robustness through local linearization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  23. Rauber, J., Zimmermann, R., Bethge, M., Brendel, W.: Foolbox native: fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX. J. Open Source Softw. 5(53), 2607 (2020)

    Article  Google Scholar 

  24. Rosenfeld, E., Winston, E., Ravikumar, P., Kolter, Z.: Certified robustness to label-flipping attacks via randomized smoothing. In: International Conference on Machine Learning, pp. 8230–8241. PMLR (2020)

    Google Scholar 

  25. Schmarje, L., Santarossa, M., Schröder, S.M., Koch, R.: A survey on semi-, self-and unsupervised learning for image classification. IEEE Access 9, 82146–82168 (2021)

    Article  Google Scholar 

  26. Shafahi, A., et al.: Poison frogs! _targeted clean-label poisoning attacks on neural networks. In: ANIPS, vol. 31 (2018)

    Google Scholar 

  27. Shafahi, A., et al.: Adversarial training for free! In: ANIPS, vol. 32 (2019)

    Google Scholar 

  28. Song, H., Kim, M., Park, D., Shin, Y., Lee, J.G.: Learning from noisy labels with deep neural networks: a survey. IEEE TNNLS 34(11), 8135–8153 (2022)

    Google Scholar 

  29. Szegedy, C., et al.: Intriguing properties of neural networks (2013)

    Google Scholar 

  30. Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR (2019)

    Google Scholar 

  31. Tolpegin, V., Truex, S., Gursoy, M.E., Liu, L.: Data poisoning attacks against federated learning systems. In: ESORICS (2020)

    Google Scholar 

  32. Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: Ensemble adversarial training: Attacks and defenses (2017)

    Google Scholar 

  33. Wong, E., Rice, L., Kolter, J.Z.: Fast is better than free: Revisiting adversarial training. arXiv preprint arXiv:2001.03994 (2020)

  34. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

  35. Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)

    Google Scholar 

  36. Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021)

    Article  Google Scholar 

  37. Zhou, D., Wang, N., Han, B., Liu, T.: Modeling adversarial noise for adversarial training. In: International Conference on Machine Learning, pp. 27353–27366. PMLR (2022)

    Google Scholar 

  38. Zhu, J., et al.: Understanding the interaction of adversarial training with noisy labels (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Ikram .

Editor information

Editors and Affiliations

A Mapped Labels

A Mapped Labels

In this appendix we provide in Figs. 10 accuracies for models trained on FashionMNIST, CIFAR100, and TinyImageNet, for Settings (S1) and (S2). These figures accompany results presented in Sect. 7.

Fig. 10.
figure 10

(S1) and (S2) evaluation, grouped by attack, with mapped labels. (a) (S1) CIFAR100 (b) (S2) CIFAR100 (c) (S1) Fashion-MNIST (d) (S2) Fashion-MNIST (e) (S1) Tiny Imagenet (f) (S2) Tiny Imagenet

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, B.Z.H., Lu, J., Zhou, X., Vatsalan, D., Ikram, M., Kaafar, M.A. (2025). On Adversarial Training with Incorrect Labels. In: Barhamgi, M., Wang, H., Wang, X. (eds) Web Information Systems Engineering – WISE 2024. WISE 2024. Lecture Notes in Computer Science, vol 15439. Springer, Singapore. https://doi.org/10.1007/978-981-96-0573-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-96-0573-6_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-96-0572-9

  • Online ISBN: 978-981-96-0573-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

  NODES
Bugs 1
INTERN 6
Note 2