Abstract
Predicting the next item of a sequence over a finite alphabet has important applications in many domains. In this paper, we present a novel prediction model named CPT (Compact Prediction Tree) which losslessly compress the training data so that all relevant information is available for each prediction. Our approach is incremental, offers a low time complexity for its training phase and is easily adaptable for different applications and contexts. We compared the performance of CPT with state of the art techniques, namely PPM (Prediction by Partial Matching), DG (Dependency Graph) and All-K-th-Order Markov. Results show that CPT yield higher accuracy on most datasets (up to 12% more than the second best approach), has better training time than DG and PPM, and is considerably smaller than All-K-th-Order Markov.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arlitt, M., Jin, T.: A workload characterization study of the 1998 world cup web site. IEEE Network 14(3), 30–37 (2000)
Cleary, J., Witten, I.: Data compression using adaptive coding and partial string matching. IEEE Trans. on Inform. Theory 24(4), 413–421 (1984)
Deshpande, M., Karypis, G.: Selective Markov models for predicting Web page accesses. ACM Transactions on Internet Technology 4(2), 163–184 (2004)
Fournier-Viger, P., Gueniche, T., Tseng, V.S.: Using Partially-Ordered Sequential Rules to Generate More Accurate Sequence Prediction. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS (LNAI), vol. 7713, pp. 431–442. Springer, Heidelberg (2012)
Padmanabhan, V.N., Mogul, J.C.: Using Prefetching to Improve World Wide Web Latency. Computer Communications 16, 358–368 (1998)
Domenech, J., de la Ossa, B., Sahuquillo, J., Gil, J.A., Pont, A.: A taxonomy of web prediction algorithms. Expert Systems with Applications (9) (2012)
Papapetrou, P., Kollios, G., Sclaroff, S., Gunopulos, D.: Discovering Frequent Arrangements of Temporal Intervals. In: Proc. of the 5th IEEE International Conference on Data Mining, pp. 354–361 (2005)
Pitkow, J., Pirolli, P.: Mining longest repeating subsequence to predict world wide web surfing. In: Proc. 2nd USENIX Symposium on Internet Technologies and Systems, Boulder, CO, pp. 13–25 (1999)
Sun, R., Giles, C.L.: Sequence Learning: From Recognition and Prediction to Sequential Decision Making. IEEE Intelligent Systems 16(4), 67–70 (2001)
Willems, F., Shtarkov, Y., Tjalkens, T.: The context-tree weighting method: Basic properties. IEEE Trans. on Information Theory 31(3), 653–664 (1995)
Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proc. 7th ACM Intern. Conf. on KDD, pp. 401–406 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gueniche, T., Fournier-Viger, P., Tseng, V.S. (2013). Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-53917-6_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53916-9
Online ISBN: 978-3-642-53917-6
eBook Packages: Computer ScienceComputer Science (R0)