Knowledge-Based Techniques for Document Fraud Detection: A Comprehensive Study

Tornés, Beatriz Martínez; Boros, Emanuela; Doucet, Antoine; Gomez-Krämer, Petra; Ogier, Jean-Marc; d’Andecy, Vincent Poulain

doi:10.1007/978-3-031-24337-0_2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13451))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

502 Accesses
2 Citations

Abstract

Due to the availability of cost-effective scanners, printers, and image processing software, document fraud detection is, unfortunately, quite common nowadays. The main challenges of this task are the lack of freely available annotated data and the overflow of mainly computer vision approaches. We consider that relying on the textual content of forged documents could provide a different view on their detection by exploring semantic inconsistencies with the aid of specialized knowledge bases. We, thus, perform an exhaustive study of existing state-of-the-art methods based on knowledge-graph embeddings (KGE) using a synthetically forged, yet realistic, receipt dataset. We also explore additional knowledge base incremental data enrichments, in order to analyze the impact of the richness of the knowledge base on each KGE method. The reported results prove that the performance of the methods varies considerably depending on the type of approach. Also, as expected, the size of the data enrichment is directly proportional to the rise in performance. Finally, we conclude that, while exploring the semantics of documents is promising, document forgery detection still poses a challenge for KGE methods.

This work was supported by the French defense innovation agency (AID) and the VERINDOC project funded by the Nouvelle-Aquitaine Region.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

CHF34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: CHF 24.95; Price includes VAT (Switzerland)

eBook: CHF 94.00; Price excludes VAT (Switzerland)

Softcover Book: CHF 118.00; Price excludes VAT (Switzerland)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

Robust Object Detection Using Knowledge Graph Embeddings

SIMARA: A Database for Key-Value Information Extraction from Full-Page Handwritten Documents

Notes

1.
Comités opérationnels départementaux anti-fraude https://www.economie.gouv.fr/codaf-comites-operationnels-departementaux-anti-fraude.
2.
https://en.wikipedia.org/wiki/SIREN_code.
3.
https://en.wikipedia.org/wiki/SIRET_code.
4.
https://www.insee.fr/en/accueil.
5.
http://sirene.fr/siren/public/home.
6.
https://api.gouv.fr/les-api/base-adresse-nationale.
7.
The dataset has been split into a training and test set (80% and 20% respectively) thanks to the PyKEEN library https://github.com/pykeen/pykeen [2], to avoid redundant triples being found both in training and test. The previously presented methods are implemented by PyKEEN, library that we chose to use for its completeness, flexibility and ease of use.

References

Abiteboul, S.: Semistructured data: from practice to theory. In: Proceedings 16th Annual IEEE Symposium on Logic in Computer Science. IEEE (2001)
Google Scholar
Ali, M., Berrendorf, M., Hoyt, C.T., Vermue, L., Sharifzadeh, S., Tresp, V., Lehmann, J.: Pykeen 1.0: a python library for training and evaluating knowledge graph emebddings (2020)
Google Scholar
Artaud, C., Doucet, A., Ogier, J.M., d’Andecy, V.P.: Receipt dataset for fraud detection. In: First International Workshop on Computational Document Forensics (2017)
Google Scholar
Artaud, C., Sidère, N., Doucet, A., Ogier, J.M., Yooz, V.P.D.: Find it! fraud detection contest report. In: 2018 24th International Conference on Pattern Recognition (ICPR). IEEE (2018)
Google Scholar
Artaud, C.: Détection des fraudes : de l’image á la sémantique du contenu : application á la vérification des informations extraites d’un corpus de tickets de caisse. Ph.D. thesis (2019)
Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Chapter Google Scholar
Balazevic, I., Allen, C., Hospedales, T.: Multi-relational poincaré graph embeddings. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Balažević, I., Allen, C., Hospedales, T.M.: Tucker: tensor factorization for knowledge graph completion (2019)
Google Scholar
Barzilay, R., Lapata, M.: Modeling local coherence: an entity-based approach. Comput. Linguist. 34(1), 1–34 (2008)
Article Google Scholar
Behera, T.K., Panigrahi, S.: Credit card fraud detection: a hybrid approach using fuzzy clustering & neural network. In: 2015 2nd International Conference on Advances in Computing and Communication Engineering. IEEE (2015)
Google Scholar
Berti-Équille, L., Borge-Holthoefer, J.: Veracity of data: from truth discovery computation algorithms to models of misinformation dynamics. Synth. Lect. Data Manag. 7(3), 1–155 (2015)
Article Google Scholar
Bertrand, R., Gomez-Kramer, P., Terrades, O.R., Franco, P., Ogier, J.M.: A system based on intrinsic features for fraudulent document detection. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 106–110. IEEE, Washington, DC, USA (2013)
Google Scholar
Bertrand, R., Terrades, O.R., Gomez-Krämer, P., Franco, P., Ogier, J.M.: A conditional random field model for font forgery detection. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE (2015)
Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: A collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. SIGMOD’08, Association for Computing Machinery, New York, NY, USA (2008)
Google Scholar
Bordes, A., Usunier, N., Garcia-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS’13, Lake Tahoe, Nevada, vol. 2, pp. 2787–2795. Curran Associates Inc., Red Hook, NY, USA (2013)
Google Scholar
Bordes, A., Weston, J., Collobert, R., Bengio, Y.: Learning structured embeddings of knowledge bases. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence, AAAI’11, San Francisco, California, pp. 301–306. AAAI Press (2011)
Google Scholar
Cozzolino, D., Gragnaniello, D., Verdoliva, L.: Image forgery detection through residual-based local descriptors and block-matching. In: 2014 IEEE International Conference on Image Processing (ICIP). IEEE (2014)
Google Scholar
Cozzolino, D., Poggi, G., Verdoliva, L.: Efficient dense-field copy-move forgery detection. IEEE Trans. Inf. Forensics Secur. 10(11), 2284–2297 (2015)
Article Google Scholar
Cozzolino, D., Verdoliva, L.: Camera-based image forgery localization using convolutional neural networks. In: 2018 26th European Signal Processing Conference (EUSIPCO). IEEE (2018)
Google Scholar
Cozzolino, D., Verdoliva, L.: Noiseprint: a CNN-based camera model fingerprint (2018)
Google Scholar
Cruz, F., Sidere, N., Coustaty, M., d’Andecy, V.P., Ogier, J.M.: Local binary patterns for document forgery detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1. IEEE (2017)
Google Scholar
Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2d knowledge graph embeddings (2018)
Google Scholar
Dong, X., et al.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’14, New York, New York, USA, pp. 601–610. Association for Computing Machinery, New York, NY, USA (2014)
Google Scholar
EulerHermes-DFCG: Plus de 7 entreprises sur 10 ont subi au moins une tentative de fraude cette annye. https://www.eulerhermes.fr/actualites/etude-fraude-2020.html
Fridrich, J., Kodovsky, J.: Rich models for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 7(3), 868–882 (2012)
Article Google Scholar
Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.M.: Fast rule mining in ontological knowledge bases with AMIE\(+\). VLDB J. 24(6), 707–730 (2015). https://doi.org/10.1007/s00778-015-0394-1
Article Google Scholar
Gesese, G.A., Biswas, R., Alam, M., Sack, H.: A survey on knowledge graph embeddings with literals: which model links better literal-ly? (2020)
Google Scholar
Goyal, N., Sachdeva, N., Kumaraguru, P.: Spy the lie: fraudulent jobs detection in recruitment domain using knowledge graphs. In: Qiu, H., Zhang, C., Fei, Z., Qiu, M., Kung, S.-Y. (eds.) KSEM 2021. LNCS (LNAI), vol. 12816, pp. 612–623. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-82147-0_50
Chapter Google Scholar
He, S., Liu, K., Ji, G., Zhao, J.: Learning to represent knowledge graphs with gaussian embedding. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (2015)
Google Scholar
Hitchcock, F.L.: The expression of a tensor or a polyadic as a sum of products. J. Math. Phys. 6, 1–4 (1927)
Article MATH Google Scholar
Huynh, V.P., Papotti, P.: A benchmark for fact checking algorithms built on knowledge bases. In: 28th ACM International Conference on Information and Knowledge Management, CIKM’19, 3rd-7th November 2019, Beijing, China (2019)
Google Scholar
Ji, G., He, S., Xu, L., Liu, K., Zhao, J.: Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (volume 1: Long papers) (2015)
Google Scholar
Ji, S., Pan, S., Cambria, E., Marttinen, P., Yu, P.S.: A survey on knowledge graphs: representation, acquisition and applications (2020)
Google Scholar
Kazemi, S.M., Poole, D.: Simple embedding for link prediction in knowledge graphs (2018)
Google Scholar
Kim, J., Kim, H.-J., Kim, H.: Fraud detection for job placement using hierarchical clusters-based deep neural networks. Appl. Intell. 49(8), 2842–2861 (2019). https://doi.org/10.1007/s10489-019-01419-2
Article Google Scholar
Kowshalya, G., Nandhini, M.: Predicting fraudulent claims in automobile insurance. In: 2018 2nd International Conference on Inventive Communication and Computational Technologies (ICICCT). IEEE (2018)
Google Scholar
Li, Y., Yan, C., Liu, W., Li, M.: Research and application of random forest model in mining automobile insurance fraud. In: 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). IEEE (2016)
Google Scholar
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015)
Google Scholar
Mishra, A., Ghorpade, C.: Credit card fraud detection on the skewed data using various classification and ensemble techniques. In: 2018 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS). IEEE (2018)
Google Scholar
Nickel, M., Rosasco, L., Poggio, T.: Holographic embeddings of knowledge graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Google Scholar
Nickel, M., Tresp, V., Kriegel, H.P.: A three-way model for collective learning on multi-relational data. In: Proceedings of the 28th International Conference on Machine Learning, ICML’11, pp. 809–816 (2011)
Google Scholar
Rabah, C.B., Coatrieux, G., Abdelfattah, R.: The supatlantique scanned documents database for digital image forensics purposes. In: 2020 IEEE International Conference on Image Processing (ICIP). IEEE (2020)
Google Scholar
Rizki, A.A., Surjandari, I., Wayasti, R.A.: Data mining application to detect financial fraud in indonesia’s public companies. In: 2017 3rd International Conference on Science in Information Technology (ICSITech). IEEE (2017)
Google Scholar
Rossi, A., Firmani, D., Matinata, A., Merialdo, P., Barbosa, D.: Knowledge graph embedding for link prediction: a comparative analysis (2020)
Google Scholar
Rossi, A., Matinata, A.: Knowledge graph embeddings: are relation-learning models learning relations? In: EDBT/ICDT Workshops (2020)
Google Scholar
Shen, A., Mistica, M., Salehi, B., Li, H., Baldwin, T., Qi, J.: Evaluating document coherence modeling. Trans. Assoc. Comput. Linguist. 9, 621–640 (2021)
Article Google Scholar
Shi, B., Weninger, T.: Proje: Embedding projection for knowledge graph completion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Google Scholar
Sidere, N., Cruz, F., Coustaty, M., Ogier, J.M.: A dataset for forgery detection and spotting in document images. In: 2017 7th International Conference on Emerging Security Technologies (EST). IEEE (2017)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, WWW’07, pp. 697–706. Association for Computing Machinery, New York, NY, USA (2007)
Google Scholar
Sun, Z., Deng, Z.H., Nie, J.Y., Tang, J.: Rotate: knowledge graph embedding by relational rotation in complex space (2019)
Google Scholar
Thorne, J., Vlachos, A.: Automated Fact Checking: task formulations, methods and future directions. CoRR (2018)
Google Scholar
Trouillon, T., Welbl, J., Riedel, S., Éric Gaussier, Bouchard, G.: Complex embeddings for simple link prediction (2016)
Google Scholar
Van Der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15(1), 3221–3245 (2014)
MathSciNet MATH Google Scholar
Vidros, S., Kolias, C., Kambourakis, G., Akoglu, L.: Automatic detection of online recruitment frauds: characteristics, methods, and a public dataset. Future Internet 9(1), 6 (2017)
Article Google Scholar
Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: a survey of approaches and applications. IEEE Trans. Knowl. Data Eng. 29(12), 2724–2743 (2017)
Article Google Scholar
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28 (2014)
Google Scholar
Yang, B., tau Yih, W., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases (2015)
Google Scholar
Zhang, S., Tay, Y., Yao, L., Liu, Q.: Quaternion knowledge graph embeddings (2019)
Google Scholar
Zhang, W., Paudel, B., Zhang, W., Bernstein, A., Chen, H.: Interaction embeddings for prediction and explanation in knowledge graphs. In: Proceedings of the 12th ACM International Conference on Web Search and Data Mining (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

University of La Rochelle, L3i, F-17000, La Rochelle, France
Beatriz Martínez Tornés, Emanuela Boros, Antoine Doucet, Petra Gomez-Krämer & Jean-Marc Ogier
Yooz, 1 Rue Fleming, 17000, La Rochelle, France
Vincent Poulain d’Andecy

Authors

Beatriz Martínez Tornés
View author publications
You can also search for this author in PubMed Google Scholar
Emanuela Boros
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Doucet
View author publications
You can also search for this author in PubMed Google Scholar
Petra Gomez-Krämer
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marc Ogier
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Poulain d’Andecy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antoine Doucet .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tornés, B.M., Boros, E., Doucet, A., Gomez-Krämer, P., Ogier, JM., d’Andecy, V.P. (2023). Knowledge-Based Techniques for Document Fraud Detection: A Comprehensive Study. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-24337-0_2
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24336-3
Online ISBN: 978-3-031-24337-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Knowledge-Based Techniques for Document Fraud Detection: A Comprehensive Study

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

Robust Object Detection Using Knowledge Graph Embeddings

SIMARA: A Database for Key-Value Information Extraction from Full-Page Handwritten Documents

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Knowledge-Based Techniques for Document Fraud Detection: A Comprehensive Study

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

Robust Object Detection Using Knowledge Graph Embeddings

SIMARA: A Database for Key-Value Information Extraction from Full-Page Handwritten Documents

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation