DNS Request Log Analysis of Universities in Shanghai: A CDN Service Provider’s Perspective
Abstract
:1. Introduction
- We select the DNS request log data spanning more than one month, containing requests from university Internet users in Shanghai, and conduct a systematic analysis. We find that DNS requests of different types of universities have distinctive characteristics of fluctuation within a day, as well as significant differences over semesters and holidays.
- We analyze the usage shares of representative CDN service providers in IPv4 and IPv6 protocols. We find that some mainstream companies have large variability in their share of IPv4 and IPv6 protocols, while some are working in tandem.
- To optimize the dynamic resource scheduling capability of CDN service providers, we adopt three types of models, i.e., statistical models, traditional machine learning models, and deep learning models, to predict the changing pattern of the number of DNS requests of CDN service providers. Results show that the deep learning models can achieve the best prediction accuracy with an average absolute percentage error of less than 3%, which has a practical application value.
2. Related Work
2.1. DNS Log
2.2. User Actions of CERNET
2.3. Request Prediction for CDNs
3. Dataset
3.1. Dataset Overview
3.2. Dataset Description
4. Representative CDN Service Providers
5. Prediction of the Numbers of DNS Requests of CDNs
5.1. Task Definition
5.2. Methods and Models
5.2.1. Static Models
5.2.2. Classical Machine Learning Models
5.2.3. Deep Learning Models
5.3. Results
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
DNS | Domain Name System |
CDN | Content delivery network |
IPv4 | Internet Protocol version 4 |
IPv6 | Internet Protocol version 6 |
ISP | Internet service provider |
CERNET | China Education and Research Network |
RMSE | Root-mean-square error |
MAE | Mean absolute error |
SVM | Support vector machine |
SVR | Support vector regression |
GBDT | Gradient boosting decision tree |
LSTM | Long short-term memory |
T-LSTM | Time-aware LSTM |
RNN | Recurrent neural network |
References
- The Main Results of the National Education Statistics in 2021. Available online: http://www.moe.gov.cn/jyb_xwfb/gzdt_gzdt/s5987/202203/t20220301_603262.html (accessed on 1 March 2022).
- Mockapetris, P.V. Domain names—Concepts and facilities. RFC 1987, 1034, 1–55. [Google Scholar] [CrossRef]
- Li, J.; Ma, X.; Li, G.; Luo, X.; Zhang, J.; Li, W.; Guan, X. Can We Learn what People are Doing from Raw DNS Queries? In Proceedings of the 2018 IEEE Conference on Computer Communications, INFOCOM 2018, Honolulu, HI, USA, 16–19 April 2018; pp. 2240–2248. [Google Scholar] [CrossRef]
- Robberechts, P.; Bosteels, M.; Davis, J.; Meert, W. Query Log Analysis: Detecting Anomalies in DNS Traffic at a TLD Resolver. In Proceedings of the ECML PKDD 2018 Workshops-DMLE 2018 and IoTStream 2018, Dublin, Ireland, 10–14 September 2018; Springer: Berlin, Germany, 2018; Volume 967, pp. 55–67. [Google Scholar] [CrossRef]
- Dan, K.; Kitagawa, N.; Sakuraba, S.; Yamai, N. Spam Domain Detection Method Using Active DNS Data and E-Mail Reception Log. In Proceedings of the 43rd IEEE Annual Computer Software and Applications Conference, COMPSAC 2019, Milwaukee, WI, USA, 15–19 July 2019; pp. 896–899. [Google Scholar] [CrossRef]
- Ghafir, I.; Prenosil, V. DNS traffic analysis for malicious domains detection. In Proceedings of the 2015 2nd International Conference on Signal Processing and Integrated Networks (SPIN), Noida, Delhi, 19–20 February 2015; pp. 613–918. [Google Scholar]
- Lai, Q.; Zhou, C.; Ma, H.; Wu, Z.; Chen, S. Visualizing and characterizing DNS lookup behaviors via log-mining. Neurocomputing 2015, 169, 100–109. [Google Scholar] [CrossRef]
- Wu, J.; Wang, J.H.; Yang, J. CNGI-CERNET2: An IPv6 deployment in China. Comput. Commun. Rev. 2011, 41, 48–52. [Google Scholar] [CrossRef]
- Wang, J.H.; An, C.; Yang, J. A study of traffic, user behavior and pricing policies in a large campus network. Comput. Commun. 2011, 34, 1922–1931. [Google Scholar] [CrossRef]
- Zhang, S.; Zhang, H.; Yang, J.; Song, G.; Wu, J. Measurement and Analysis of Adult Websites in IPv6 Networks. In Proceedings of the 20th Asia-Pacific Network Operations and Management Symposium, APNOMS 2019, Matsue, Japan, 18–20 September 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Wang, Z.; Yang, J.; Zhang, S.; Li, C.; Zhang, H. Automatic Model Selection for Anomaly Detection. In Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, Tianjin, China, 23–26 August 2016; pp. 276–283. [Google Scholar] [CrossRef]
- Hu, H.; Wen, Y.; Chua, T.; Wang, Z.; Huang, J.; Zhu, W.; Wu, D. Community based effective social video contents placement in cloud centric CDN network. In Proceedings of the IEEE International Conference on Multimedia and Expo, ICME 2014, Chengdu, China, 14–18 July 2014; pp. 1–6. [Google Scholar] [CrossRef]
- Wu, C.; Chen, X.; Zhu, W.; Zhang, Y. Socially-Driven Learning-Based Prefetching in Mobile Online Social Networks. IEEE/ACM Trans. Netw. 2017, 25, 2320–2333. [Google Scholar] [CrossRef]
- Liu, J.; Yang, Q.; Simon, G. Congestion Avoidance and Load Balancing in Content Placement and Request Redirection for Mobile CDN. IEEE/ACM Trans. Netw. 2018, 26, 851–863. [Google Scholar] [CrossRef]
- Hours, H.; Biersack, E.W.; Loiseau, P.; Finamore, A.; Mellia, M. A study of the impact of DNS resolvers on CDN performance using a causal approach. Comput. Netw. 2016, 109, 200–210. [Google Scholar] [CrossRef]
- Calder, M.; Flavel, A.; Katz-Bassett, E.; Mahajan, R.; Padhye, J. Analyzing the Performance of an Anycast CDN. In Proceedings of the 2015 ACM Internet Measurement Conference, IMC 2015, Tokyo, Japan, 28–30 October 2015; ACM: Yokohama, Japan, 2015; pp. 531–537. [Google Scholar] [CrossRef]
- Han, C.; Li, Z.; Xie, G.; Uhlig, S.; Wu, Y.; Li, L.; Ge, J.; Liu, Y. Insights into the issue in IPv6 adoption: A view from the Chinese IPv6 Application mix. Concurr. Comput. Pract. Exp. 2016, 28, 616–630. [Google Scholar] [CrossRef] [Green Version]
- Gao, H.; Yegneswaran, V.; Chen, Y.; Porras, P.A.; Ghosh, S.; Jiang, J.; Duan, H. An empirical reexamination of global DNS behavior. In Proceedings of the ACM SIGCOMM 2013 Conference, SIGCOMM 2013, Hong Kong, China, 12–16 August 2013; pp. 267–278. [Google Scholar] [CrossRef] [Green Version]
- Notice on the Announcement of the Second Round of “Double First-class” Initiative Construction Universities and Construction Disciplines. Available online: http://www.gov.cn/zhengce/zhengceku/2022-02/14/content_5673496.htm (accessed on 12 March 2022).
- Sun, J.; Li, Y.; Zhao, X.; Zhang, N. An Evaluation on Investment of Research Funds with a Neural Network Algorithm in “Double First-Class” Universities. Complex 2020, 2020, 7496126:1–7496126:8. [Google Scholar] [CrossRef]
- Yang, J.; Sabnis, A.; Berger, D.S.; Rashmi, K.V.; Sitaraman, R.K. C2DN: How to Harness Erasure Codes at the Edge for Efficient Content Delivery. In Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022, Renton, WA, USA, 4–6 April 2022; pp. 1159–1177. [Google Scholar]
- Zhou, M.; Guo, T.; Chen, Y.; Wan, J.; Wang, X. Polygon: A QUIC-based CDN server selection system supporting multiple resource demands. In Proceedings of the the 22nd International Middleware Conference: Industrial Track, Québec City, QC, Canada, 6–10 December 2021; ACM: Windsor, ON, Canada, 2021; pp. 16–22. [Google Scholar] [CrossRef]
- Wang, J. A survey of web caching schemes for the Internet. Comput. Commun. Rev. 1999, 29, 36–46. [Google Scholar] [CrossRef] [Green Version]
- Wang, K.; Zhang, J.; Bai, G.; Ko, R.K.L.; Dong, J.S. It’s Not Just the Site, It’s the Contents: Intra-domain Fingerprinting Social Media Websites Through CDN Bursts. In Proceedings of the WWW ’21: The Web Conference 2021, Virtual Event, Ljubljana, Slovenia, 19–23 April 2021; pp. 2142–2153. [Google Scholar] [CrossRef]
- National IPv6 Development and Monitoring Platform. Available online: https://www.china-ipv6.cn/#/client/simpleInfo (accessed on 19 April 2022).
- Li, X.; Chen, Y.; Zhou, M.; Guo, T.; Wang, C.; Xiao, Y.; Wan, J.; Wang, X. Artemis: A Latency-Oriented Naming and Routing System. IEEE Trans. Parallel Distrib. Syst. 2022, 33, 4874–4890. [Google Scholar] [CrossRef]
- Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
- Jiang, W. Internet traffic prediction with deep neural networks. Internet Technol. Lett. 2022, 5, e314. [Google Scholar] [CrossRef]
- Carta, S.; Medda, A.; Pili, A.; Recupero, D.R.; Saia, R. Forecasting E-Commerce Products Prices by Combining an Autoregressive Integrated Moving Average (ARIMA) Model and Google Trends Data. Future Internet 2019, 11, 5. [Google Scholar] [CrossRef] [Green Version]
- Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef] [Green Version]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; pp. 3146–3154. [Google Scholar]
- Avola, D.; Cinque, L.; Mambro, A.D.; Diko, A.; Fagioli, A.; Foresti, G.L.; Marini, M.R.; Mecca, A.; Pannone, D. Low-Altitude Aerial Video Surveillance via One-Class SVM Anomaly Detection from Textural Features in UAV Images. Information 2022, 13, 2. [Google Scholar] [CrossRef]
- Liu, T.; Yan, D.; Wang, R.; Yan, N.; Chen, G. Identification of Fake Stereo Audio Using SVM and CNN. Information 2021, 12, 263. [Google Scholar] [CrossRef]
- Akbar, I.A.; Igasaki, T. Drowsiness Estimation Using Electroencephalogram and Recurrent Support Vector Regression. Information 2019, 10, 217. [Google Scholar] [CrossRef] [Green Version]
- Jiao, W.; Hao, X.; Qin, C. The Image Classification Method with CNN-XGBoost Model Based on Adaptive Particle Swarm Optimization. Information 2021, 12, 156. [Google Scholar] [CrossRef]
- Dhaliwal, S.S.; Nahid, A.A.; Abbas, R. Effective Intrusion Detection System Using XGBoost. Information 2018, 9, 149. [Google Scholar] [CrossRef]
- Gong, Q.; Chen, Y.; He, X.; Zhuang, Z.; Wang, T.; Huang, H.; Wang, X.; Fu, X. DeepScan: Exploiting Deep Learning for Malicious Account Detection in Location-Based Social Networks. IEEE Commun. Mag. 2018, 56, 21–27. [Google Scholar] [CrossRef]
- Gong, Q.; Chen, Y.; He, X.; Xiao, Y.; Hui, P.; Wang, X.; Fu, X. Cross-site Prediction on Social Influence for Cold-start Users in Online Social Networks. ACM Trans. Web. 2021, 15, 6:1–6:23. [Google Scholar] [CrossRef]
- Ni, Y.; Dong, F.; Zou, M.; Li, W. Movie Box Office Prediction Based on Multi-Model Ensembles. Information 2022, 13, 299. [Google Scholar] [CrossRef]
- Niu, B.; Ren, J.; Li, X. Credit Scoring Using Machine Learning by Combing Social Network Information: Evidence from Peer-to-Peer Lending. Information 2019, 10, 397. [Google Scholar] [CrossRef] [Green Version]
- Hua, Y. An efficient traffic classification scheme using embedded feature selection and lightgbm. In Proceedings of the 2020 Information Communication Technologies Conference (ICTC), Jeju Island, 21–23 October 2020; pp. 125–130. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Bengio, Y.; Simard, P.Y.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef] [PubMed]
- Xie, Q.; Guo, T.; Chen, Y.; Xiao, Y.; Wang, X.; Zhao, B.Y. Deep Graph Convolutional Networks for Incident-Driven Traffic Speed Prediction. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management, Virtual Event, 19–23 October 2020; pp. 1665–1674. [Google Scholar] [CrossRef]
- Muhuri, P.S.; Chatterjee, P.; Yuan, X.; Roy, K.; Esterline, A.C. Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classify Network Attacks. Information 2020, 11, 243. [Google Scholar] [CrossRef]
- Kwon, H.; Kim, P. A Missing Data Compensation Method Using LSTM Estimates and Weights in AMI System. Information 2021, 12, 341. [Google Scholar] [CrossRef]
- Baytas, I.M.; Xiao, C.; Zhang, X.; Wang, F.; Jain, A.K.; Zhou, J. Patient Subtyping via Time-Aware LSTM Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 65–74. [Google Scholar] [CrossRef]
- Ma, F.; Gao, J.; Suo, Q.; You, Q.; Zhou, J.; Zhang, A. Risk Prediction on Electronic Health Records with Prior Medical Knowledge. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, 19–23 August 2018; pp. 1910–1919. [Google Scholar] [CrossRef]
- Zhang, Y.; Yang, X.; Ivy, J.S.; Chi, M. ATTAIN: Attention-based Time-Aware LSTM Networks for Disease Progression Modeling. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, 10–16 August 2019; pp. 4369–4375. [Google Scholar] [CrossRef] [Green Version]
- Ye, Q.; Gao, Y.; Zhang, Z.; Chen, Y.; Li, Y.; Gao, M.; Chen, S.; Wang, X.; Chen, Y. Modeling Access Environment and Behavior Sequence for Financial Identity Theft Detection in E-Commerce Services. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–22 July 2022; pp. 1–8. [Google Scholar] [CrossRef]
- Brauwers, G.; Frasincar, F. A General Survey on Attention Mechanisms in Deep Learning. IEEE Trans. Knowl. Data Eng. 2021. [Google Scholar] [CrossRef]
CDN Service Provider | Domain Name Suffixes |
---|---|
Huawei Cloud | c.cdnhwc1.com; c.cdnhwc2.com; c.cdnhwc3.com |
Tencent Cloud | dnspod.com; dnsv1.com; *tcdn.qq.com; tcdnlive.com; tdnsv5.com |
Alibaba Cloud | aliyundoc.com.cn; aliyundoc.com; w.kunlunsl.com; cdngslb.com; kunluncom; tbcache.com; alicdn.com |
Kingsoft Cloud | ks-cdn.com |
China Mobile | cdn.10086.cn |
Baishan Cloud | qingcdn.com; bsclink.cn; trpcdn.net; bsgslb.cn |
Baidu Cloud | bdydns.com; jomodns.com |
China Telecom | ctycdn.com |
ByteDance Cloud | bytedns.net; cdn.bytedance.com; bytefcdn.com |
JD Cloud | jcloud-cdn.com; jdcdn.com |
Wangsu | wsdvs.com; wscdns.com; wsglb0.com; cdn20.com; qtlcdn.com; mwcloudcdn.com; lxdns.co |
UCloud | ucloud.com.cn |
Qiniu Cloud | qiniudns.com |
Dataset | Metric | SARIMAX | SVR | XGBoost | LightGBM | LSTM | T-LSTM | LSTM-Attention |
---|---|---|---|---|---|---|---|---|
Alibaba Cloud | RMSE (×10) | 17.5 | 203 | 8.79 | 7.83 | 7.07 | 6.63 | 5.25 |
MAE (×10) | 13.8 | 48.3 | 6.00 | 4.91 | 4.32 | 4.27 | 3.83 | |
MAPE (%) | 11.6 | 56.4 | 4.50 | 3.98 | 3.04 | 2.91 | 2.87 | |
Tencent Cloud | RMSE (×10) | 21.1 | 42.1 | 10.0 | 8.46 | 5.08 | 4.14 | 4.06 |
MAE (×10) | 17.7 | 35.9 | 6.96 | 6.03 | 3.22 | 2.92 | 2.89 | |
MAPE (%) | 25.0 | 50.7 | 6.83 | 5.41 | 2.84 | 3.02 | 2.96 | |
13 CDN service providers | RMSE (×10) | 63.9 | 57.1 | 31.1 | 21.6 | 22.0 | 23.0 | 17.2 |
MAE (×10) | 56.0 | 173 | 33.2 | 22.1 | 14.5 | 13.4 | 13.0 | |
MAPE (%) | 10.6 | 48.2 | 4.81 | 3.96 | 2.68 | 2.72 | 2.66 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, Z.; Guo, T.; Luo, S.; Zhuang, Y.; Ma, Y.; Chen, Y.; Wang, X. DNS Request Log Analysis of Universities in Shanghai: A CDN Service Provider’s Perspective. Information 2022, 13, 542. https://doi.org/10.3390/info13110542
Sun Z, Guo T, Luo S, Zhuang Y, Ma Y, Chen Y, Wang X. DNS Request Log Analysis of Universities in Shanghai: A CDN Service Provider’s Perspective. Information. 2022; 13(11):542. https://doi.org/10.3390/info13110542
Chicago/Turabian StyleSun, Zhiyang, Tiancheng Guo, Shiyu Luo, Yingqiu Zhuang, Yuke Ma, Yang Chen, and Xin Wang. 2022. "DNS Request Log Analysis of Universities in Shanghai: A CDN Service Provider’s Perspective" Information 13, no. 11: 542. https://doi.org/10.3390/info13110542
APA StyleSun, Z., Guo, T., Luo, S., Zhuang, Y., Ma, Y., Chen, Y., & Wang, X. (2022). DNS Request Log Analysis of Universities in Shanghai: A CDN Service Provider’s Perspective. Information, 13(11), 542. https://doi.org/10.3390/info13110542