For more than four years, a team from New Mexico State University’s Klipsch School of Electrical and Computer Engineering in the College of Engineering has collaborated on high-performance computing research with researchers from the Los Alamos National Laboratory. A product of this partnership includes a peer-reviewed technical paper that was accepted and presented at Supercomputing 2021, which took place in St. Louis, Missouri, in November 2021.
The paper, “Hybrid, scalable, trace-driven performance modeling of general-purpose computing on graphics processing units,” is the first from NMSU selected for Supercomputing, an international conference for high-performance computing, networking, storage and analysis, since 2012. The project is a joint effort between graduate students and faculty from NMSU and Los Alamos National Laboratory.
“We were very excited. This was the first submission for the paper,” said Abdel-Hameed Badawy, Klipsch School of Electrical and Computer Engineering associate professor and one of the paper’s authors. “Getting into Supercomputing is not an easy feat. It needs a lot of work. It’s an achievement to get into the SC conference.”
To present at the hybrid-format conference, Badawy traveled to St. Louis with Yehia Arafa, the lead author who graduated from NMSU in December 2021 with a Ph.D. in computer engineering and joined QUALCOMM research as a senior engineer in January 2022.
“I am very excited to see my Ph.D. work presented at a high-impact conference like SC,” Arafa said. “The key idea was to have a fast, accurate and scalable tool that the high-performance computing community could use in their GPU modeling and simulation research.”
The authors include NMSU graduate students Ammar ElWazir, who is currently with AMD; Atanu Barai, who is currently with Intel; Ali Eker, AMD senior software engineer; Gopinath Chennupati, formerly with LANL and currently with Amazon; Nandakishore Santhi and Stephan Johannes Eidenbenz, both computer scientists with LANL’s Information Science group.
The paper describes the performance prediction toolkit-graphics processing unit (PPT-GPU) toolkit, which is a hardware-software co-design and performance prediction framework for HPC applications that run on GP-GPUs. It has 10 to 30 times the performance and scalability of competing tools.
“The basic value proposition of the PPT-GPU toolkit is that it offers significant speed-up – up to three orders of magnitude over alternative tools with almost no penalty in prediction accuracy,” Eidenbenz said. “Efficiently leveraging GPU architectures is a formidable challenge. It has been a great experience for the laboratory to partner with NMSU and take advantage of academic resources and capabilities from within New Mexico.”
“Performance prediction of scientific codes on general purpose GPUs is a typically error-prone and a time-consuming process,” Santhi said. “Our approach with the PPT-GPU toolkit means science applications that previously took weeks to simulate and analyze can now take just a few hours, speeding up the software-hardware co-design process considerably. Improved performance of GPUs will benefit research areas, including machine learning and artificial intelligence, drug discovery and medicine, and many other applications where computing is integral.”
While Badawy said he was immensely proud of the accomplishment, he hopes HPC research at NMSU becomes a significant research focus. He believes publishing and presenting can elevate NMSU’s national status and springboard to other opportunities and collaborations for researchers, faculty, staff and students across campus.
To read the paper, visit https://dl.acm.org/doi/abs/10.1145/3458817.3476221.