skip to main content
10.1145/3225058.3225140acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model

Published: 13 August 2018 Publication History

Abstract

Dynamical core is one of the most time-consuming parts in the global atmospheric general circulation model, which is widely used for the numerical simulation of the dynamic evolution process of global atmosphere. Due to its complicated calculation procedures and the non-uniformity of latitude-longitude mesh, the parallelization suffers from high communication overhead. In this paper, we deduce the operator form of the calculating flow in the dynamical core. Furthermore, it is abstracted out that the stencil and collection alternate action is the basic operation in the dynamic core. Based on the operator form of the calculation flow, we propose the corresponding optimization strategy for each operator. In the end, we develop a communication-avoiding algorithm to reduce communication overhead in the dynamic core. Our experiments show that the communication-avoiding algorithm reduces the total runtime by 54% at most for a 50 km resolution model running 10 years. Especially for communication reduction, the new algorithm achieves 1.4x speedup on average for the collective communication and 3.9x speedup on average for the communication involved in the stencil computation.

References

[1]
Akio Arakawa and Vivian R. Lamb. 1977. Computational Design of the Basic Dynamical Processes of the UCLA General Circulation Model. Methods in Computational Physics: Advances in Research and Applications 17 (1977), 173--265.
[2]
Yuya Baba, Keiko Takahashi, Takeshi Sugimura, and Koji Goto. 2010. Dynamical Core of an Atmospheric General Circulation Model on a Yin-Yang Grid. Monthly Weather Review 138, 10 (2010), 3988--4005.
[3]
Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. 2011. Minimizing communication in Numerical Linear Algebra. SIAM J. Matrix Anal. Appl. 32, 3 (2011), 866--901.
[4]
Gianfranco Bilardi, Michele Scquizzato, and Francesco Silvestri. 2012. A lower bound technique for communication on BSP with application to the FFT. In Euro-Par 2012 Parallel Processing. Springer, Berlin, Heidelberg, 676--687.
[5]
J. Demmel, M. Hoemmen, M. Mohiyuddin, and K. Yelick. 2008. Avoiding communication in sparse matrix computations. In 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE, Miami, FL, USA, 1--12.
[6]
John M. Dennis, Jim Edwards, Katherine J. Evans, Oksana Guba, Peter H. Lauritzen, Arthur A. Mirin, Amik Stcyr, Mark A. Taylor, and Patrick H. Worley. 2012. CAM-SE: A scalable spectral element dynamical core for the Community Atmosphere Model. International Journal of High Performance Computing Applications 26, 1 (2012), 74--89.
[7]
Haohuan Fu, Junfeng Liao, Wei Xue, Lanning Wang, Dexun Chen, Long Gu, Jinxiu Xu, Nan Ding, Xinliang Wang, Conghui He, Shizhen Xu, Yishuang Liang, Jiarui Fang, Yuanchao Xu, Weijie Zheng, Jingheng Xu, Zhen Zheng, Wanjing Wei, Xu Ji, He Zhang, Bingwei Chen, Kaiwei Li, Xiaomeng Huang, Wenguang Chen, and Guangwen Yang. 2016. Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway Taihulight Supercomputer. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16). IEEE Press, Piscataway, NJ, USA, Article 83, 12 pages. http://dl.acm.org/citation.cfm?id=3014904.3015016
[8]
Sigal Gottlieb, Chi-Wang Shu, and Eitan Tadmor. 2001. Strong Stability-Preserving High-Order Time Discretization Methods. Siam Review 43, 1 (2001), 89--112.
[9]
Tobias Gysi, Tobias Grosser, and Torsten Hoefler. 2015. MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). ACM, New York, NY, USA, 177--186.
[10]
Kevin Hamilton and Wataru Ohfuchi. 2008. High Resolution Numerical Modelling of the Atmosphere and Ocean. Springer, New York.
[11]
Isaac M. Held and Max J. Suarez. 1994. A proposal for the Intercomparison of the Dynamical Cores of Atmospheric General Circulation Models. Bulletin of the American Meteorological Society 75, 10 (1994), 1825--1830.
[12]
Norman A. Phillips. 1957. A coordinate system having some special advantages for numerical forecasting. Journal of Meteorology 14, 2 (1957), 184--185.
[13]
William M. Putman. 2007. Development of the Finite-Volume Dynamical Core on the Cubed-Sphere. PhD thesis, The Florida State University, Tallahassee, Florida. http://purl.flvc.org/fsu/fd/FSU_migr_etd-0511
[14]
Michele Scquizzato and Francesco Silvestri. 2014. Communication Lower Bounds for Distributed-Memory Computations. In 31st International Symposium on Theoretical Aspects of Computer Science (STACS '2014) (Leibniz International Proceedings in Informatics (LIPIcs)), Ernst W. Mayr and Natacha Portier (Eds.), Vol. 25. Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 627--638.
[15]
Takashi Shimokawabe, Takayuki Aoki, Tomohiro Takaki, Toshio Endo, Akinori Yamanaka, Naoya Maruyama, Akira Nukada, and Satoshi Matsuoka. 2011. Petascale Phase-field Simulation for Dendritic Solidification on the TSUBAME 2.0 Supercomputer. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, New York, NY, USA, Article 3, 11 pages.
[16]
Edgar Solomonik, Erin Carson, Nicholas Knight, and James Demmel. 2014. Trade-offs Between Synchronization, Communication, and Computation in Parallel Linear Algebra Computations. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '14). ACM, New York, NY, USA, 307--318.
[17]
Edgar Solomonik, Erin Carson, Nicholas Knight, and James Demmel. 2017. Trade-Offs Between Synchronization, Communication, and Computation in Parallel Linear Algebra Computations. ACM Transactions on Parallel Computing 3, 1, Article 3 (Jan. 2017), 47 pages.
[18]
M. A. Taylor, J. Edwards, and A. St. Cyr. 2008. Petascale atmospheric models for the Community Climate System Model: new developments and evaluation of scalable dynamical cores. Journal of Physics: Conference Series 125, 1 (2008), 12023--12032. http://stacks.iop.org/1742-6596/125/i=1/a=012023
[19]
Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of Collective Communication Operations in MPICH. International Journal of High Performance Computing Applications 19, 1 (2005), 49--66.
[20]
Paul A. Ullrich, Peter H. Lauritzen, and Christiane Jablonowski. 2009. Geometrically Exact Conservative Remapping (GECoRe): Regular Latitude-Longitude and Cubed-Sphere Grids. Monthly Weather Review 137, 6 (2009), 1721--1741.
[21]
Ludwig Umscheid JR. and M. Sankar-Rao. 1971. Further tests of a grid system for global numerical prediction. Monthly Weather Review 99, 9 (1971), 686--690.
[22]
Yuzhu Wang, Jinrong Jiang, He Zhang, Xiao Dong, Lizhe Wang, Rajiv Ranjan, and Albert Y. Zomaya. 2017. A scalable parallel algorithm for atmospheric general circulation models on a multi-core cluster. Future Generation Computer Systems 72 (2017), 1--10.
[23]
M. F. Wehner, J. J. Ambrosiano, J. C. Brown, W. P. Dannevik, P. G. Eltgroth, A. A. Mirin, J. D. Farrara, C. C. Ma, C. R. Mechoso, and J. A. Spahr. 1993. Toward a high performance distributed memory climate model. In Proceedings of The 2nd International Symposium on High Performance Distributed Computing. IEEE, Spokane, WA, USA, 102--113.
[24]
David L. Williamson. 2007. The Evolution of Dynamical Cores for Global Atmospheric Models. Journal of The Meteorological Society of Japan 85B (2007), 241--269.
[25]
Wei Xue, Chao Yang, Haohuan Fu, Xinliang Wang, Yangtong Xu, Junfeng Liao, Lin Gan, Yutong Lu, Rajiv Ranjan, and Lizhe Wang. 2015. Ultra-Scalable CPUMIC Acceleration of Mesoscale Atmospheric Modeling on Tianhe-2. IEEE Trans. Comput. 64, 8 (Aug 2015), 2382--2393.
[26]
Chao Yang, Wei Xue, Haohuan Fu, Lin Gan, Linfeng Li, Yangtong Xu, Yutong Lu, Jiachang Sun, Guangwen Yang, and Weimin Zheng. 2013. A Peta-scalable CPU-GPU Algorithm for Global Atmospheric Simulations. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '13). ACM, New York, NY, USA, 1--12.
[27]
He Zhang, Minghua Zhang, and Qingcun Zeng. 2013. Sensitivity of Simulated Climate to Two Atmospheric Models: Interpretation of Differences between Dry Models and Moist Models. Monthly Weather Review 141, 5 (2013), 1558--1576.

Cited By

View all
  • (2024)Pipe-AGCM: A Fine-Grain Pipelining Scheme for Optimizing the Parallel Atmospheric General Circulation ModelEuro-Par 2024: Parallel Processing10.1007/978-3-031-69583-4_20(283-297)Online publication date: 26-Aug-2024
  • (2023)AGCM-3DLF: Accelerating Atmospheric General Circulation Model via 3-D Parallelization and Leap-FormatIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.323101334:3(766-780)Online publication date: 1-Mar-2023
  • (2022)W-cycle SVDProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571994(1-16)Online publication date: 13-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '18: Proceedings of the 47th International Conference on Parallel Processing
August 2018
945 pages
ISBN:9781450365109
DOI:10.1145/3225058
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • University of Oregon: University of Oregon

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. atmospheric general circulation model
  2. collective communication
  3. communication avoiding
  4. operator form of calculation flow
  5. stencil computation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP 2018

Acceptance Rates

ICPP '18 Paper Acceptance Rate 91 of 313 submissions, 29%;
Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Pipe-AGCM: A Fine-Grain Pipelining Scheme for Optimizing the Parallel Atmospheric General Circulation ModelEuro-Par 2024: Parallel Processing10.1007/978-3-031-69583-4_20(283-297)Online publication date: 26-Aug-2024
  • (2023)AGCM-3DLF: Accelerating Atmospheric General Circulation Model via 3-D Parallelization and Leap-FormatIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.323101334:3(766-780)Online publication date: 1-Mar-2023
  • (2022)W-cycle SVDProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571994(1-16)Online publication date: 13-Nov-2022
  • (2022)W-Cycle SVD: A Multilevel Algorithm for Batched SVD on GPUsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00087(1-16)Online publication date: Nov-2022
  • (2021)I/O lower bounds for auto-tuning of convolutions in CNNsProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441609(247-261)Online publication date: 17-Feb-2021
  • (2020)A Highly Efficient Dynamical Core of Atmospheric General Circulation Model based on Leap-Format2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00020(95-104)Online publication date: May-2020
  • (2019)Trade-offs between computation, communication, and synchronization in stencil-collective alternate updateCCF Transactions on High Performance Computing10.1007/s42514-019-00011-x1:2(144-160)Online publication date: 26-Jul-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media

  NODES
Association 2
COMMUNITY 3
INTERN 20
Note 1