Abstract
The task of grouping strokes into different categories is an essential processing step in the automatic analysis of online handwritten documents. The technical challenge originates from the variation of the handwriting style, content heterogeneity and lack of prior layout knowledge. In this work, we propose the edge graph attention network (EGAT) to address the stroke classification problem. In this framework, the stroke classification problem is formulated as a node classification problem in a relational graph, which is constructed based on the temporal and spatial relationship of strokes. Then distributed node and edge features for classification are learned by stacking of multiple edge graph attention layers, in which various attention mechanisms are exploited to aggregate information between neighborhood nodes. In the task of text/nontext classification, the proposed model achieves accuracies 98.65% and 98.90% on the IAMOnDo and Kondate datasets, respectively. In the task of multi-class classification, the achieved accuracies are 95.81%, 97.36% and 99.05% on the IAMOnDo, FC and FA datasets, respectively. In addition, we conduct ablation experiments to quantitatively and qualitatively evaluate the key modules of our model.
Similar content being viewed by others
References
Awal AM, Feng G, Mouchere H, Viard-Gaudin C. First experiments on a new online handwritten flowchart database. In: Document Recognition and Retrieval, vol. 7874, p. 78740A. International Society for Optics and Photonics 2011.
Bishop CM, Svensen M, Hinton GE. Distinguishing text from graphics in on-line handwritten ink. In: International Conference on Frontiers in Handwriting Recognition, 2004;142–147.
Bresler M, Prusa D, Hlavác V. Detection of arrows in on-line sketched diagrams using relative stroke positioning. Winter Conf Appl Comput Vis. 2015;10:610–617.
Bresler M, Prusa D, Hlavác V. Online recognition of sketched arrow-connected diagrams. Int J Doc Anal Recogn. 2016;19(3):253–267.
Bresler M, Van Phan T, Prusa D, Nakagawa M, Hlavác V. Recognition system for on-line sketched diagrams. In: International Conference on Frontiers in Handwriting Recognition, 2014;563–568 .
Carton C, Lemaitre A, Coüasnon B. Fusion of statistical and structural information for flowchart recognition. In: International Conference on Document Analysis and Recognition, 2013;1210–1214.
Defferrard M, Bresson X, Vandergheynst P. Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems, 2016;3844–3852.
Delaye A, Liu CL. Contextual text/non-text stroke classification in online handwritten notes with conditional random fields. Pattern Recogn. 2014;47(3):959–968.
Delaye A, Liu CL. Multi-class segmentation of free-form online documents with tree conditional random fields. Int J Doc Anal Recogn. 2014;17(4):313–329.
Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics, 2010;249–256.
Gong L, Cheng Q. Exploiting edge features for graph neural networks. In: Conference on Computer Vision and Pattern Recognition, 2019;9211–9219.
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–1780.
Indermühle E. Analysis of digital ink in electronic documents. Ph.D. thesis, University of Bern 2012.
Indermühle E, Frinken V, Bunke H. Mode detection in online handwritten documents using blstm neural networks. In: International Conference on Frontiers in Handwriting Recognition, 2012;302–307.
Indermühle E, Liwicki M, Bunke H. Iamondo-database: an online handwritten document database with non-uniform contents. In: International Workshop on Document Analysis Systems, 2010;97–104.
Jain AK, Namboodiri AM, Subrahmonia J. Structure in on-line documents. In: International Conference on Document Analysis and Recognition, 2001;844–848.
Kingma D, Ba J. Adam: A method for stochastic optimization. In: International Conference on Learning Representation 2015.
Kipf T, Welling M. Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations 2017.
Koller D, Friedman N. Probabilistic graphical models: principles and techniques. New York: MIT press; 2009.
Lafferty J, McCallum A, Pereira FC. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, 2001;282–289.
Lemaitre A, Mouchère H, Camillerapp J, Coüasnon B. Interest of syntactic knowledge for on-line flowchart recognition. In: International Workshop on Graphics Recognition, pp. 89–98. Springer, 2011.
Mochida K, Nakagawa M. Separating figures, mathematical formulas and japanese text from free handwriting in mixed online documents. Int J Pattern Recognit Artif Intell. 2004;18(07):1173–1187.
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, 2019, pp. 8024–8035.
Peterson EJ, Stahovich TF, Doi E, Alvarado C. Grouping strokes into shapes in hand-drawn diagrams. In: AAAI Conference on Artificial Intelligence, 2010; 974–979.
Phan TV, Nakagawa M. Combination of global and local contexts for text/non-text classification in heterogeneous online handwritten documents. Pattern Recogn. 2016;51:112–124.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. In: International Conference on Learning Representation 2018.
Wang C, Mouchère H, Lemaitre A, Viard-Gaudin C. Online flowchart understanding by combining max-margin markov random field with grammatical analysis. Int J Doc Anal Recogn. 2017;20(2):123–136.
Wang C, Mouchere H, Viard-Gaudin C, Jin L. Combined segmentation and recognition of online handwritten diagrams with high order markov random field. In: International Conference on Frontiers in Handwriting Recognition, 2016, pp. 252–257.
Wang M, Yu L, Zheng D, Gan Q, Gai Y, Ye Z, Li M, Zhou J, Huang Q, Ma C et al. Deep graph library: Towards efficient and scalable deep learning on graphs. In: International Conference on Learning Representation 2019.
Weber M, Liwicki M, Schelske YT, Schoelzel C, Strauß F, Dengel A. Mcs for online mode detection: Evaluation on pen-enabled multi-touch interfaces. In: International Conference on Document Analysis and Recognition, 2011, pp. 957–961.
Wu J, Wang C, Zhang L, Rui Y. Offline sketch parsing via shapeness estimation. In: International Joint Conference on Artificial Intelligence, 2015, pp. 1200–1206.
Ye JY, Zhang YM, Liu CL. Joint training of conditional random fields and neural networks for stroke classification in online handwritten documents. In: International Conference on Pattern Recognition, 2016, pp. 3264–3269.
Ye JY, Zhang YM, Yang Q, Liu CL. Contextual stroke classification in online handwritten documents with graph attention networks. In: International Conference on Document Analysis and Recognition, 2019, pp. 993–998.
Zhou XD, Liu CL. Text/non-text ink stroke classification in japanese handwriting based on markov random fields. Int Conf Document Anal Recogn. 2007;1:377–381.
Acknowledgements
This work has been supported in part by the National Key Research and Development Program Grant 2018YFB1005000 and the National Natural Science Foundation of China (NSFC) Grants 61773376 and 61721004.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
This work has no conflict of interest with any personal or funding parties.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection "Document Analysis and Recognition" guest edited by Michael Blumenstein, Seiichi Uchida and Cheng-Lin Liu.
Appendix
Appendix
Dataset Statistics
This section is supplementary to section 4.1 and presents the statistics of each dataset.
Hyperparameters
This section is supplementary to Section 3.4 and presents the chosen hyperparameters for all experiments. For all edge attention layers, the hyperparameters of each layer (\((C',D',K)\) are kept the same. We tune the hyperparameters on the validation set by random search.
Rights and permissions
About this article
Cite this article
Ye, JY., Zhang, YM., Yang, Q. et al. Contextual Stroke Classification in Online Handwritten Documents with Edge Graph Attention Networks. SN COMPUT. SCI. 1, 163 (2020). https://doi.org/10.1007/s42979-020-00177-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-020-00177-0