HPCA 2020 Trip Report – IEEE TCCA Blog

This article originally appeared on the SIGARCH blog. Highlights of the business meeting were added for this version.

By all accounts, HPCA (and co-located PoPP/CGO) was a tremendous success with fabulous keynotes, many ground-breaking papers and an almost too-wonderful setting in sunny San Diego.

Keynotes

The first day was started by a keynote from Josep Torellas from UIUC, who argued that research will and should become more interdisciplinary over time. Three reasons support this view: 1. Our community is growing, including internationally, which means more opportunity for broad research, 2. Many new topics are becoming popular, which go beyond core architecture research areas (security, programming, OS, algorithms, circuits, bio, communication), and 3. Government funding is not growing, and most opportunities involve multiple research areas. Balance among core and interdisciplinary research, both in terms of funding and research priorities, remains an important question.

The second keynote was by Michael Garland, the director of research at NVIDIA. He spoke about the challenges of low-level programming for parallel hardware, how we need higher-level programming constructs that enable parallelism. A simple example of spawning multiple GPU threads and waiting showed that poor abstractions can lead to both no parallelism and high synchronization overhead. Addressing these challenges, he first described the coming support by NVIDIA compilers for C++17’s Parallel Algorithms, which enables transparent acceleration on GPUs. A new project is Legate, which enables programs which use the popular NumPy array-programming libraries to be transparently parallelized by changing a single import statement. It utilizes the Legion programming model and highly-parallel decentralized task scheduling on many nodes and GPUs.

Chris Lattener (SiFive) and Tatiana Shpeisma (Google) make the case for MLIR, a new flexible compiler intermediate representation. They point-out that many compiler ecosystems (eg. LLVM, Tensorflow) have a variety of graph IRs at different levels — these similar-but-different technologies cause duplication of infrastructure, and are both fragile and have poor support for understanding cross-cutting failures. MLIR’s goal is to provide domain-specific optimization capability while being general enough for multiple levels of compiler infrastructure, and for multiple communities. The design of MLIR borrows heavily from LLVM, but enables greater flexibility and extensibility. One of the key innovations in MLIR is the ability to express “dialects,” which are families of operations and types useful at a particular level. Chris ends the talk by making an impassioned case for replacing Clang and LLVM IR with MLIR, even though it will be a long and difficult task.

Best Paper Session

The best paper award went to “SIGMA”, by Qin et al. from Georgia Tech and Intel. SIGMA is a reconfigurable architecture for DNN training, which offers greater flexibility to kernel shapes and efficient support for sparsity. This is enabled by a novel reduction tree network and supporting compiler. The runner up was by Lin et al. from USC and Oregon State. They develop a deep reinforcement-learning framework for exploring NOC design. These works highlight the importance of architecture innovation for machine learning, and machine learning’s potential to enable powerful architecture optimization.

Other works in this session include “EMSim” by Sehatbakhsh et al. from Georgia Tech, that enables simulation of magnetic side-channel signal attacks on processor pipelines. “Impala”, by Sadredini et al. from Virginia, develops an algorithmic/architecture co-design for pattern matching automata.

Test of Time Award

The HPCA Test of Time (ToT) award recognizes the most influential papers published in prior sessions of HPCA which have had significant impact in the field. The Test of Time award this year went to “A Delay Model and Speculative Architecture for Pipelined Routers” from HPCA-7 (2001) by Li-Shiuan Peh and Bill Dally. This work was notable both for the way that it changed the focus of the community from off-chip to on-chip networks, as well as being a standard reference and analytical modeling tool for the community.

Key Sessions

One highlight was the “Back to the Future” Vision talks, hosted by Yan Solihin. Steven Swanson (UCSD) took us on a journey to a parallel universe called “core-world”, in which core memory (non-volatile memory, NVM) became the dominant storage technology for the last several decades instead of DRAM. In core-world, many of the challenges with NVM could have been solved earlier, and it is up to us to invest sufficient time in our world for NVM to “catch-up” to where we could have been in core-world. Hsien-Hsin Sean Lee of Facebook discussed the limits of specialization for ML workloads and beyond. While it seems like we are hitting physical limits of transistors, we are still off by 8 orders of magnitude from fundamental physical limits. This suggests that perhaps other technologies (eg. photonics or others) are necessary to continue scaling. David Kaeli, from Northeastern University, discussed the necessity, challenges and new directions for extreme-parallelism multi-GPU systems.

The two industry sessions were also exceptionally well-attended, and half these works focused on practical aspects of machine learning. For example, one talk was by Udit Gupta, who described the importance (majority of ML workload) and unique architecture challenges (memory bandwidth) from recommendation systems employed at Facebook. Daniel Richins discussed the end-to-end performance challenges in deep learning workloads in edge data centers, and found that tasks like data format conversion introduce “AI tax” that should be addressed to achieve high performance.

Workshops and Tutorials

HPCA featured a well-rounded set of tutorials spanning a wide variety of areas, which certainly supported the call for interdisciplinary research. Tutorials included a full day quantum-programming tutorial by researchers from NC-State, an AI benchmarking tutorial on AIBench and surrounding methodological issues by researchers from ICT, and multi-gpu simulator, Akita/MGPUSim, from Northeastern University. Several workshops/tutorials returned from previous years, based on the topics of accelerating biology/bioinformatics, cloud runtimes, and cognitive architectures.

Business Meeting Highlights

Dissertation Award: Of note was the very small number of submissions for the outstanding dissertation award (3 last year). Please consider submitting in future years!
Student Advocates: IEEE TCCA has recently launched the TCCA Student Advocates initiative to provide help and advice to computer architecture students who are stressed out or have questions. Student advocates are accessible during conferences, but also throughout the year.
Graduate Student Cohort: Relatedly, Elba Garza and Raghavendra Pothukuchi (elba@tamu.edu, pothuku2@illinois.edu) are proposing to form a graduate student support group — by the students and for the students. One of their goals is to have student representatives in organizing groups for SIG events to reflect students’ concerns.
Diversity Policy: TCCA has recently developed a TCCA Diversity Policy on Formation of committees advocated by Gabe Loh (AMD). Daniel Jimenez will be leading the effort.
One discussion point was whether or not to have HPCA outside of the US more frequently. The current practice is to have HPCA outside of the US every three years. The final decision was to send a poll and let the community decide.
Yan Solihin discusses observations from the HPCA program committee here.
HPCA-27 (2021) will be held in Seoul, South Korea, with Jung Ho Ahn from Seoul National university as the general chair. The paper submission deadline will be July 31st. There was one bid for HPCA-28 (2022) for Montreal. Josep encouraged more volunteers/bids for 2022.

A significant fraction of the business meeting was spent discussing the handling of IEEE/ACM investigations in the past year. The community expressed concern over the tragic incident and a need for transparency into the investigation. To the extent possible, the TCCA leadership clarified how the investigation has evolved in recent months.

This also led to a discussion about the review process itself. One broad item was how to ensure reviewer accountability. Another item was whether some functionalities in HotCRP are unnecessary and should be limited (e.g. downloading all the papers, seeing the name of voters during the PC meeting, etc). One immediate result of these discussions is that anonymous PC voting was implemented for the ISCA PC.

Final Thoughts

One trend in this year’s HPCA is that many papers (including 3/4 in best paper session) have open-sourced their tools and frameworks. This is clearly to the benefit of our community, yet raises questions about whether and how our community should evaluate and promote such contributions. Of note was CGO’s approach of having a “tools papers” track, in which artifact evaluation was mandatory. In fact, CGO papers which chose to have artifact evaluation were significantly more likely to be accepted. This will be increasingly relevant for the architecture community to consider going forwards.

Finally, we hope that HPCA is not the last architecture conference of 2020, and that we can continue to have meaningful in-person interactions in a post-corona-virus world for the remainder of the year — fingers crossed. 🤞

HPCA 2020 Trip Report

Tony Nowatzki, Newsha Ardalani

+ posts