Abstract: The emerging heterogeneous high-performance computing architecture with discrete GPUs and/or many core processors, especially in the era of exascale computing, requires many levels of parallelization in designing massively parallel modeling code for nonlinear plasma physical studies. In this presentation, we introduce a multilevel parallelization technique that uses a hybrid message passing interfaceand thread-level concurrency-based task parallelization. In this method, the allocated computer resources are recursively divided into many small group communicators, with each communicator as a subcomponent for physical component execution. On each computer node, however, thread-level concurrency-based task parallelization is employed with OpenMP. With these multilevel parallelization techniques, we demonstrate excellent performance scalability by several example applications in a particle-based simulation (PIC simulation), an adaptive mesh refinement MHD simulation code, and an integrated whole-device modeling code (TRANSP).