Coupled-Cluster Methods for Large Molecular Systems Through Massive Parallelism and Reduced-Scaling Approaches

TR Number
Journal Title
Journal ISSN
Volume Title
Virginia Tech

Accurate correlated electronic structure methods involve a significant amount of computations and can be only employed to small molecular systems. For example, the coupled-cluster singles, doubles, and perturbative triples model (CCSD(T)), which is known as the ``gold standard" of quantum chemistry for its accuracy, usually can treat molecules with 20-30 atoms. To extend the reach of accurate correlated electronic structure methods to larger molecular systems, we work towards two directions: parallel computing and reduced-cost/scaling approaches. Parallel computing can utilize more computational resources to handle systems that demand more substantial computational efforts. Reduced-cost/scaling approaches, which introduce approximations to the existing electronic structure methods, can significantly reduce the amount of computation and storage requirements.

In this work, we introduce a new distributed-memory massively parallel implementation of standard and explicitly correlated (F12) coupled-cluster singles and doubles (CCSD) with canonical bigO{N^6} computational complexity ( C. Peng, J. A. Calvin, F. Pavov{s}evi'c, J. Zhang, and E. F. Valeev, textit{J. Phys. Chem. A} 2016, textbf{120}, 10231.), based on the TiledArray tensor framework. Excellent strong scaling is demonstrated on a multi-core shared-memory computer, a commodity distributed-memory computer, and a national-scale supercomputer. We also present a distributed-memory implementation of the density-fitting (DF) based CCSD(T) method. (C. Peng, J. A. Calvin, and E. F. Valeev, textit{in preparation for submission}) An improved parallel DF-CCSD is presented utilizing lazy evaluation for tensors with more than two unoccupied indices, which makes the DF-CCSD storage requirements always smaller than those of the non-iterative triples correction (T).

Excellent strong scaling is observed on both shared-memory and distributed-memory computers equipped with conventional Intel Xeon processors and the Intel Xeon Phi (Knights Landing) processors. With the new implementation, the CCSD(T) energies can be evaluated for systems containing 200 electrons and 1000 basis functions in a few days using a small size commodity cluster, with even more massive computations possible on leadership-class computing resources. The inclusion of F12 correction to the CCSD(T) method makes it converge to basis set limit much more rapidly. The large-scale parallel explicitly correlated coupled-cluster program makes the accurate estimation of the coupled-cluster basis set limit for molecules with 20 or more atoms a routine. Thus, it can be used rigorously to test the emerging reduced-scaling coupled-cluster approaches.

Moreover, we extend the pair natural orbital (PNO) approach to excited states through the equation-of-motion coupled cluster singles and doubles (EOM-CCSD) method. (C. Peng, M. C. Clement, and E. F. Valeev, textit{submitted}) We simulate the PNO-EOM-CCSD method using an existing massively parallel canonical EOM-CCSD program. We propose the use of state-averaged PNOs, which are generated from the average of the pair density of excited states, to span the PNO space of all the excited states. The doubles amplitudes in the CIS(D) method are used to compute the state-averaged pair density of excited states. The issue of incorrect states in the state-averaged pair density, caused by an energy reordering of excited states between the CIS(D) and EOM-CCSD, is resolved by simply computing more states than desired. We find that with a truncation threshold of 10−7, the truncation error for the excitation energy is already below 0.02 eV for the systems tested, while the average number of PNOs is reduced to 50-70 per pair. The accuracy of the PNO-EOM-CCSD method on local, Rydberg and charge transfer states is also investigated.

Quantum Chemistry, Electronic Structure Theory, Explicitly Correlated Coupled-Cluster Methods, Parallel Computing