IEEE TCCA Blog

ACE Center: Energy-efficient Distributed Computing for the Next Decade and Beyond

Josep Torrellas — Wed, 21 Aug 2024 14:26:45 +0000

One of the most pressing challenges facing today’s digital society is how to curb the relentless increase in the energy consumption of computing. Without major action, such an increase is even likely to accelerate, as ubiquitous AI models are embedded in applications ranging from personalized content creation, extended reality, and automation and control. A recent New York Times article that may prove prescient points out how US electric utilities are being overwhelmed by the demand from data centers. The message for computer architecture and systems researchers is clear: here is an area where your ideas can have a great impact.

The ACE Center for Evolvable Computing

Based on several studies that crystallized into two reports, the Semiconductor Research Corporation (SRC) established the ACE Center, focused on developing new architectures and paradigms for distributed computing with a radically new computing trajectory, to attain order-of-magnitude improvements in energy efficiency. The center is rooted in the academic community, with 21 faculty members [1] with diverse domain expertise and over 100 graduate students.

On the surface, the center’s roadmap will look familiar to any researcher in our field: leverage hardware accelerators and integration, minimize data movement, co-design hardware and software innovations, and integrate security and correctness from the ground up so they do not have to be retrofitted later. The challenge is to go beyond many disjoint improvements and provide coordinated, multidisciplinary innovations with substantial combined impact.

An idea that underpins ACE is evolvability: accelerator hardware, specialized communication stacks, or customized security mechanisms should be designed for extensibility and composability. They should have compatible interfaces, accommodate upgrades of their external environments, and be easily replaceable by (and co-exist with) next-generation designs of the same module. These principles have served us well with general-purpose processors; we should retain them as we move to an accelerator-centric era.

High-Performance Energy-Efficient Computing

To attain high energy efficiency, it is our vision that data centers will contain a large number of hardware accelerators with different functionalities, organized into distributed ensembles. Smart compilers will generate executable code from different sections of an application for different types of accelerators. Then, the runtime will assign each of these binaries to the most appropriate accelerator from an ensemble. Such ensembles will be spatially and temporally shared by multiple tenants in a secure manner. Further, to attain the highest efficiency, new classes of general-purpose processors will be specialized for different workload domains.

Toward this vision, we are developing Composable Compute Acceleration (COCA), a framework where multiple heterogeneous chiplets are integrated into a Multichip Module (MCM). Chiplets include general-purpose cores, accelerator ASICs, and FPGA dies connected with a synthesizable chip-to-chip UCIe (Universal Chiplet Interconnect Express) interface. Reconfigurability is attained offline by combining different mixes of chiplets in different MCM instances, and online by reprogramming the chiplets based on the needs of popular workloads.

Accelerators in different nodes of the datacenter are harnessed together to accelerate an operation with large data or compute needs, such as datacenter-wide sparse tensor operations. In this case, if the operation is heavily communication-bound, developing efficient algorithms for data transfer—possibly leveraging the sparse pattern of the data—is crucial.

A new compiler infrastructure is key to this vision. We are developing an open-source unified ACE compiler stack to program diverse accelerators. The infrastructure includes specialized front-end compilers for large language models or graph neural networks that translate the code into a shared intermediate form. Then, back-end ML compilers generate code for various hardware accelerators. We believe that this infrastructure can catalyze the use of novel hardware accelerators.

We specialize CPUs for specific workload domains. For example, microservice environments execute short service requests that interact via remote procedure calls and are subject to tail latency constraints. In contrast, general-purpose processors are designed for traditional monolithic applications, as they support global hardware cache coherence, incorporate microarchitecture for long-running, predictable applications such as advanced prefetching, and are optimized for average rather than tail latency. To address this imbalance, we propose 𝜇Manycore, which casts out some of these features and is optimized for tail latency.

Communication and Coordination

A striking feature of modern data centers is that hardware remains highly underutilized. This is a major source of energy waste. Moreover, the software infrastructure that enables data center operation contains major inefficiencies resulting from the desire to remain general-purpose. These are some of the aspects that ACE is addressing. Our vision includes reconfigurable network topologies to enable efficient use of the resources, a nimble runtime that bundles computation in small buckets and ships them where the data is, flexible networking stacks specialized to the accelerators available in the data center, and computing in network switches and SmartNICs to efficiently offload processor tasks.

A contributor to hardware underutilization is the inflexibility of the data center network. Different workloads exhibit different communication patterns and, in some cases, the patterns are clear and periodic—such as in AI training. Yet the inter-node links and their bandwidth are fixed, which is suboptimal. In ACE, we dynamically reconfigure optical interconnects to adjust network topology and link bandwidth based on the workload. Moreover, we have developed LIBRA to perform design space exploration of networks for distributed AI training. LIBRA recommends the topology and bandwidth allocation for each level of the network.

Current networking stacks are general-purpose, even though important workloads or hardware devices may not need many of the features. Moreover, they use the kernel for secure operation, which further adds to the execution overhead. Bypassing the kernel, e.g., with RDMA, results in fast but insecure communication. To address these issues, we propose using eBPF (Extended Berkeley Packet Filter) and customizing the network stack to particular uses. The result is fast and secure operation.

Even with the most advanced hardware and leanest communication stacks, performance will lag if accelerators are often idle because the scheduler fails to assign work to them. Similarly, energy savings will not be realized if accelerators often operate on remote data, as energy for data movement will remain dominant. To tackle these challenges, we propose a runtime that bundles computation in small buckets called Proclets that are easy to schedule and migrate. Proclets enable load rebalancing and consolidation across compute units. Further, by migrating computation, they minimize the need to move data.

Computing in SmartNICs and network switches will improve the performance and energy efficiency of distributed workloads. We are examining efficient host-NIC interfaces, including a cache-coherent one (CC-NIC), and use-cases of compute offload to NICs. Computing in switches can be highly efficient in some applications, such as straggler detection and handling in AI training. To generate efficient code to run on a switch for uses such as anomaly detection or traffic classification, we propose an automated system. The user specifies high-level directives and the system automatically generates efficient ML models to run on the switches.

Security and Correctness

This part of the center focuses on conceiving new security paradigms that are more effective and easier to use than current ones. It also develops techniques for security and correctness verification from the earliest stages of hardware design. A challenge involves finding a common framework where computer designers and formal verification experts can effectively cooperate.

As accelerators will be routinely operating on sensitive information, it is necessary to rethink how tools and mechanisms for CPU security and correctness can be redesigned for an accelerator-rich environment. Some of the current efforts include using information flow control in multi-tenant accelerators, applying automatic RTL-level instrumentation and analysis to detect security vulnerabilities in accelerators, and developing domain-specific trusted execution environments (TEEs) for accelerators. In particular, we envision an automated framework to generate customized TEEs for accelerators in a programmer-friendly manner.

We have developed several security and correctness techniques that we hope will be useful to the community. They include TEESec for pre-silicon security verification of TEEs, Untangle for high-performance and safe dynamic partitioning of hardware structures, SpecVerilog for design verification using a security-typed hardware description language, and G-QED for quick and thorough pre-silicon verification of hardware designs. Much work still needs to be done in this area.

Concluding Remarks

This blog has shared a research agenda that we hope will be embraced, expanded, and driven by a large section of our community. If, as a community, we manage to address the energy challenge, the rewards will be high. Incremental progress as usual is not an option.

[1] ACE Center Researchers:

Josep Torrellas (U. of Illinois), Minlan Yu (Harvard), Tarek Abdelzaher (U. of Illinois), Mohammad Alian (U. of Kansas), Adam Belay (MIT), Manya Ghobadi (MIT), Rajesh Gupta (UCSD), Christos Kozyrakis (Stanford), Tushar Krishna (GA Tech), Arvind Krishnamurthy (U. of Washington), Jose Martinez (Cornell), Charith Mendis (U. of Illinois), Subhasish Mitra (Stanford), Muhammad Shahbaz (Purdue), Edward Suh (Cornell), Steven Swanson (UCSD), Michael Taylor (U. of Washington), Radu Teodorescu (Ohio State U.), Mohit Tiwari (U. of Texas), Mengjia Yan (MIT), Zhengya Zhang (U. of Michigan), Zhiru Zhang (Cornell).

About the Author

Josep Torrellas is the Saburo Muroga Professor of Computer Science at the University of Illinois at Urbana-Champaign and the Director of the SRC/DARPA ACE Center for Evolvable Computing. He has made contributions to shared-memory multiprocessor architectures and thread-level speculation.

Architecture 2.0: Why Computer Architects Need a Data-Centric AI Gymnasium

Vijay Janapa Reddi and Amir Yazdanbakhsh — Sat, 17 Jun 2023 19:19:07 +0000

Machine learning-driven computer architecture tools and methods have the potential to drastically shape the future of computer architecture. The question is: how can we lay the foundation to effectively usher in this era? In this post, we delve into the transformative impact of machine learning (ML) on the research landscape, emphasizing the importance of understanding both its potential and pitfalls to fully support ML-assisted computer architecture research. By exploring these advancements, our aim is to highlight the opportunities that lie ahead and outline the collective steps that we, as a community, can take towards realizing the era of “Architecture 2.0.”

The Dawn of Architecture 2.0

In recent years, computer architecture research has been enriched by the advent of ML techniques. With the increasing complexity and design space of modern computing systems, ML-assisted architecture research has become a popular approach to improve the design and optimization of edge and cloud computing, heterogeneous, and complex computer systems.

ML techniques, such as deep learning and reinforcement learning, have shown promise in optimizing and designing various hardware and software components of computer systems, such as memory controllers, resource allocation, compiler optimization, cache allocation, scheduling, cloud resource sharing, power consumption, security and privacy. This has led to a proliferation of ML-assisted architecture research, with many researchers exploring new methods and algorithms to improve computer systems’ efficiency and learned embeddings for system design.

With the emergence of Large Language Models (LLM) like ChatGPT and BARD, as well as Generative AI models like DallE-2, Midjourney, and Stable Diffusion, future ML technologies are bound to offer a plethora of exciting possibilities for a new generation of computer architects. For instance, prompts such as “Act as a computer architect and generate me an ALU such that it meets the following requirements: …” might be commonplace. Such capabilities coupled with advances like AutoGPT may enable an AI assistant in the future to become “proactive” beyond the current “reactive” AI methods and this will likely unlock new capabilities. Users will only need to provide goals to the model and it will do everything by itself in an autonomous, iterative loop–planning, critiquing, acting, and reviewing. Elevating methods like prompt engineering for chip design to the next level.

This paradigm—Architecture 2.0—uses machine learning to minimize human intervention and build more complex, efficient systems in a shorter timeframe. Undoubtedly, Architecture 2.0 will bring about a revolutionary shift in research and development within computer architecture. Exciting new avenues will be explored. Generative AI will likely play a creative role in Architecture 2.0 by empowering architects and designers to rapidly generate and explore a wide array of design options. These areas exemplify the vast potential for growth and innovation within the field, similar to the transformative impact that Software 2.0 is having on the programming world.

Challenges with ML-Assisted Architecture Design

While ML-driven architecture research holds great promise, it also poses several challenges that we must understand and tackle collectively. Figure 1 illustrates some of the major challenges, including but not limited to the following:

Figure 1: Key challenges with ML-assisted architecture design.

Lack of large, high-quality (i.e., representative) public datasets: Machine learning-driven systems research often relies on large, high-quality datasets to train and validate models. The efficacy of machine learning can be attributed, in part, to the development and utilization of high-quality datasets. However, in the context of computer architecture, such datasets are scarce and often not reusable, making it challenging for researchers to conduct their studies, compare their results, and develop effective solutions. Adding to the already challenging situation, representative datasets that accurately mirror the intricacies of real-world fleets and encompass the complete operational behavior of the entire system are even more difficult to come by.
Inability to “scrape” the internet for creating public datasets: In many other machine learning domains, researchers can collect data by simply scraping the internet or using publicly available datasets (e.g. large language models mainly use readily available web crawl data for training). However, this approach is neither feasible nor scalable in the context of computer architecture research, as the data required is often specific to certain hardware and software configurations and may be proprietary data.
Data generation from cycle-level simulators is slow and difficult: Simulators are often used to generate data for machine learning-driven systems research (such as building proxy models, searching for architecture parameters, etc.). Simulators like these are often slow, computationally expensive, and may not consistently reflect real-world systems accurately. Additionally, simulations are often intractable for multi-node or datacenter systems, limiting scalability and reducing data quality.
Rapidly evolving ML algorithms landscape: The machine learning algorithms landscape is constantly changing, with new models and techniques being developed regularly. This can make it difficult for researchers to keep up with the latest developments and integrate them into their projects (i.e., the hardware lottery).
Unclear applicability of ML algorithms to architecture problems: While machine learning has demonstrated success in a variety of domains, it is not always evident which problems in computer architecture are amenable to be solved effectively by ML algorithms. In addition, it is not clear how ML algorithms can be effectively applied to address computer architecture problems. This may result in wasted resources and suboptimal solutions.
Need for agile full-stack co-design: It is necessary for all the system components to evolve together. Unfortunately, certain advancements in algorithms often get overlooked due to the lack of corresponding hardware support. For instance, although it is evident that machine learning can benefit from leveraging sparsity, it is rarely implemented because, without hardware support, performance improvements are not achievable. Compilers must adapt and advance alongside both the hardware advancements and the evolving algorithms.
Difficulty with verifying, validating, and interpreting ML algorithms for system design: Architects need to verify and validate the designs and regularly reason about the consequences of their decisions, and interpret the implications of each design point on the overall performance of the target system. However, interpretability and understanding why a particular ML-assisted approach works or provenance about how the decision/tradeoff was made is still a missing piece for which reproducibility and systematically defining metrics, such as accuracy vs. uncertainty, are of critical importance.

In addition, we believe that the progress of ML-assisted architecture research is being hindered by several other factors: the absence of standardized benchmarks and baselines, challenges with reproducibility, and difficulties in evaluation. These issues collectively impede the advancement of this field. These issues have garnered attention and generated interest within the machine learning community, resulting in the organization of recent workshops and challenges.

Comparing different methods and algorithms is challenging. For instance, different researchers may use different datasets or metrics, making it challenging to compare results across studies (e.g., cycle-accurate EdgeTPU vs. analytical DNN accelerators). One study may use a dataset that is much easier or harder to learn from than another or use a less exhaustive hyperparameter search, which could lead to different results even if the same algorithm is used. Similarly, one study may use a metric that emphasizes a different aspect of performance or evaluation than another study, which could lead to different conclusions about the relative effectiveness of different algorithms.

Reproducing published ML results can also be difficult, especially if the code or data used is not publicly available. Without access to the code or data, it can be challenging, if not impossible, to determine whether differences in results are due to differences in methodology, implementation, or dataset. This can lead to a lack of confidence in published results and make it difficult for researchers to build upon each other’s work and make progress.

Furthermore, evaluating the effectiveness of ML algorithms in architecture research can be complex, as the performance of the algorithms may depend on various factors such as hardware configuration, workload, and optimization objectives (i.e., hyperparameter lottery). An algorithm that performs well on one type of workload may not perform well on another type of workload, or an algorithm that is optimized for one type of hardware configuration may not be effective on a different configuration. This makes it challenging to generalize results across different scenarios and to identify the conditions under which an algorithm is most effective.

Data-centric AI Gymnasium for Architecture 2.0

To overcome these challenges, we need to embrace a data-centric AI mindset where data rather than code is treated as a first-class citizen in computer architecture. Traditionally, tools such as gem5 and Pin were used to explore, design, and evaluate architectures based on application-level code characteristics. But when data is the rocket fuel for ML algorithms, we must build the next generation of data-centric tools and infrastructure that will enable researchers and practitioners to collaborate, and develop standard benchmarks, metrics, and datasets. We also need to invest more in efficient data generation techniques that will be useful for ML-assisted design. Last but not least, we need a playbook or a taxonomy that outlines how to effectively apply ML to systems problems.

To this end, we believe that we can learn from and leverage approaches like the OpenAI gymnasium-type environment for computer architecture research. The OpenAI Gym is a widely accepted toolkit in the ML community for developing and comparing reinforcement learning algorithms. It accelerated research by providing a standard interface (API) to communicate between learning algorithms and environments, as well as a standard set of environments that were compliant with that API. Since its release, its API has become the field standard for doing this. The gym has also provided a common platform for researchers to develop and compare reinforcement learning algorithms. This has led to a number of important advances in the field, including the development of new algorithms (e.g., DQN and Proximal Policy Optimization (PPO)) that are more efficient and effective than previous methods. The Gym has also been used to develop new benchmarks for algorithms. These benchmarks provide a way to compare the performance of different algorithms on a common set of tasks. This has been helpful for researchers to identify which algorithms are most effective for different tasks.

In a similar vein, we need a gymnasium for Architecture 2.0 to foster and nurture ML-assisted architecture research to pursue data-driven solutions. It would enable researchers to pose intriguing questions, share their code and data with the community, promoting collaboration and accelerating research progress.

The Architecture 2.0 gymnasium would enable researchers to easily add simulators as new environments, compare results objectively, share datasets, and develop new algorithms, etc. Architecture research encompasses a wide range of methods to explore the design space and develop novel solutions, including reinforcement learning, bayesian optimization, ant colony optimization, genetic algorithms, and more. Thus, it will be crucial for the gym to possess the necessary flexibility to accommodate all these diverse approaches for exploration. Furthermore, the gym would naturally encourage researchers to publish their breakthrough papers alongside datasets and code that will provide readers with valuable insights not only into the model strategy but also into the data pre-processing techniques and hyperparameter tuning processes employed. By promoting such transparency, the gymnasium can foster reproducibility and enable objective comparisons, as emphasized in the post.

Figure 2: Creating an active community for Architecture 2.0.

In general, we recommend the gymnasium to encompass the following tenets as illustrated in Figure 2:

Curated datasets: A collection of representative datasets and benchmarks designed to systematically evaluate different ML algorithms in computer architecture research. For instance, we need more resources like the open-source Google workload traces that were put out to aid systems research.
Leaderboards: Leaderboards are instrumental in fostering healthy competition among researchers. By showcasing the latest results, we can inspire researchers to push boundaries, compare solutions, develop new solutions, and refine existing methodologies. Additionally, leaderboards can also serve as effective benchmarks. There is much we can learn from existing leaderboards like Dynabench and adopt them for our own purposes.
Competitions: We should revive the “Workshop on Computer Architecture Competitions” and other similar computer architecture competitions (e.g. Branch Prediction, ML Prefetching) and adapt them for Architecture 2.0 to bootstrap the discovery of state-of-the-art methods and algorithms.
Challenges: Challenges hosted on a broadly accessible platform, such as hackathons or workshops, to promote collaboration and facilitate knowledge exchange among researchers and practitioners. To supercharge the architecture community, we need a Kaggle-style mentality for Architecture 2.0 that would serve as a hub to attract, nurture, train, and challenge a new generation of architects from all around the world.

To nurture such a healthy and active community, we need accessible open-source tools and libraries that readily facilitate the implementation and testing of different ML algorithms in computer architecture research. For instance, tools like Pin provide high-level APIs that abstract away low-level implementation details to make developing and deploying program instrumentation tools easy. By making sure we develop Architecture 2.0 ML tools, such as CompilerGym and ArchGym, that are transparent and easy to run for the designer, we can empower researchers to focus on their core expertise instead of getting overwhelmed with details irrelevant to them.

In addition to tools, we also need consistent and standardized evaluation metrics that can be reliably used to compare the performance, efficiency, and robustness of different ML algorithms in computer architecture research. Metrics often appear straightforward when viewed in hindsight. However, there is a considerable amount of nuance associated with them. An incorrect metric can result in misguided optimization strategies. For instance, it took a long time for ML processor architects to realize that relying solely on TOPS/W (alone) can be harmful.

Call for Participation

Building the Architecture 2.0 ecosystem extends beyond the capabilities of any individual group. It requires a collective effort. Therefore, we invite the community to join us in the effort to identify, design, and develop the next generation of ML-assisted tools and infrastructure. If you are interested in contributing to Architecture 2.0, please fill out this Google Form to meet with us at ISCA 2023 as part of the Federated Computing Research Conference. Students and researchers of all ages and groups are welcome. Even if you are unable to attend the conference, please take a moment to fill out the form. This will enable us to contact you when we schedule a community kickoff meeting. We look forward to hearing from you and hopefully seeing you soon. Let’s build the future together!

Conclusion

An Architecture 2.0 data-centric AI gymnasium would provide a number of benefits for academia and the industry. It would make it easier for academic researchers to experiment with different algorithms, understand the pros and cons of different algorithms, reproduce each other’s results, compare the performance of their own algorithms to strong baselines, and explore more design space. The creation of such an ecosystem would also benefit the industry as it would accelerate the pace of innovation, lead to the development of new and more efficient designs, and help to bridge the gap between machine learning and the architecture and systems communities. What we propose is not unfounded. In fact, almost two decades ago, MicroLib tried to enable researchers to do a comparison of architectural designs with others through a standard interface. Considering the advancements in technology and the evolution of the field towards ML-assisted design, it is now even more critical for a shared ecosystem.

Acknowledgments

We proactively solicited feedback from numerous people to craft this vision. We appreciate the feedback from Saman Amarasinghe (MIT), David Brooks (Harvard), Brian Hirano (Micron), Qijing Jenny Huang (Nvidia), Ravi Iyer (Intel), David Kanter (MLCommons), Christos Kozyrakis (Stanford), Hsien-Hsin Sean Lee (Intel), Benjamin C. Lee (UPenn), Jae W. Lee (SNU), Martin Maas (Google DeepMind), Divya Mahajan (GaTech), Phitchaya Mangpo Phothilimthana (Google DeepMind), Parthasarathy Ranganathan (Google), Carole-Jean Wu (Meta), Hongil Yoon (Google), Cliff Young (Google DeepMind). We would like to acknowledge and highlight the contributions of Srivatsan Krishnan (Harvard), who led the research project that generated many of the ideas discussed in this work. We also extend our gratitude to Jason Jabbour (Harvard), Shvetank Prakash (Harvard), Thierry Thambe (Harvard), and Ikechukwu Uchendu (Harvard) for their valuable feedback and contributions.

About the Authors

Vijay Janapa Reddi is the John L. Loeb Associate Professor of Engineering and Applied Sciences at Harvard University. He helped co-found MLCommons, a non-profit organization committed to accelerating machine learning for the benefit of all. Within MLCommons, he serves as Vice President and holds a position on the board of directors. Vijay oversees MLCommons Research, which brings together a diverse team of over 125 researchers from various organizations to provide exploratory value to MLCommons members. He co-led the development of the MLPerf benchmark, which encompasses ML in datacenters, edge computing, mobile devices, and the Internet of Things (IoT). Vijay is the recipient of best paper and IEEE Micro TopPicks awards and other accolades, including the Gilbreth Lecturer Honor from the National Academy of Engineering (NAE) and IEEE TCCA Young Computer Architect Award.

Amir Yazdanbakhsh is a research scientist at Google DeepMind. Most of his research revolves around Computer Systems and Machine Learning. Amir is the co-founder and co-lead of the Machine Learning for Computer Architecture team where they leverage the recent machine learning methods and advancements to innovate and design better hardware accelerators.

Recent IEEE Computer Architecture Letters Publications (2022 Issue1)

IEEE CAL Editorial Board — Mon, 23 Jan 2023 16:45:46 +0000

In this blog post, we highlight recent publications from the IEEE Computer Architecture Letters (CAL) from the IEEE Computer Architecture Letter Vol. 21 (Issue1 • Jan-Jun2022). We include a short summary if provided by the authors.

IEEE CAL Editorial Board

Accelerating Graph Processing With Lightweight Learning-Based Data Reordering

Mo Zou, Mingzhe Zhang, Rujia Wang, Xian-He Sun, Xiaochun Ye, Dongrui Fan, Zhimin Tang

MPU-Sim: A Simulator for In-DRAM Near-Bank Processing Architectures

Xinfeng Xie, Peng Gu, Jiayi Huang, Yufei Ding, Yuan Xie

MPU-Sim is an end-to-end simulator from parallel programs to hardware architectures for near-bank in-DRAM processing-in-memory. With calibrated hardware simulation models and well-defined programming interfaces, MPU-Sim can help facilitate the future research and development of near-bank in-DRAM processing systems and hardware architectures.

A Pre-Silicon Approach to Discovering Microarchitectural Vulnerabilities in Security Critical Applications

Kristin Barber, Moein Ghaniyoun, Yinqian Zhang, Radu Teodorescu

This paper introduces a promising new direction for detecting microarchitectural vulnerabilities by using pre-silicon simulation, tracing infrastructure, and differential analysis techniques to search for exploitable behavior of hardware designs during the execution of an application of interest.

MQSim-E: An Enterprise SSD Simulator

Dusol Lee, Duwon Hong, Wonil Choi, Jihong Kim

Lightweight Hardware Implementation of Binary Ring-LWE PQC Accelerator

Benjamin J. Lucas, Ali Alwan, Marion Murzello, Yazheng Tu, Pengzhou He, Andrew J. Schwartz, David Guevara, Ujjwal Guin, Kyle Juretus, Jiafeng Xie

This paper focuses on developing an efficient PQC hardware accelerator for the binary Ring-learning-with-errors (BRLWE)-based encryption scheme, a promising lightweight PQC suitable for resource-constrained applications.

Characterizing and Understanding Distributed GNN Training on GPUs

Haiyang Lin, Mingyu Yan, Xiaocheng Yang, Mo Zou, Wenming Li, Xiaochun Ye, Dongrui Fan

The paper has an in-depth analysis of distributed GNN training by profiling the end-to-end execution with the state-of-the-art framework, Pytorch-Geometric (PyG), revealing several significant observations and providing useful guidelines for both software optimization and hardware optimization.

LSim: Fine-Grained Simulation Framework for Large-Scale Performance Evaluation

Hamin Jang, Taehun Kang, Joonsung Kim, Jaeyong Cho, Jae-Eon Jo, Seungwook Lee, Wooseok Chang, Jangwoo Kim, Hanhwi Jang

Performance modeling of a large-scale workload without a large-scale system is extremely challenging. To this end, we present LSim, an accurate framework for large-scale performance modeling. Based on the captured workload behavior within small-scale workload traces, LSim predicts the performance at large scales.

LINAC: A Spatially Linear Accelerator for Convolutional Neural Networks

Hang Xiao, Haobo Xu, Ying Wang, Yujie Wang, Yinhe Han

This paper proposes a linear regression-based method to utilize the spatially linear correlation between activations and weights of CNN models. Stronger bit-sparsity is excavated to achieve further bit-sparsity-of-activations acceleration and memory communication reduction.

In Memoriam — Michel Dubois (April 25, 1953 – July 7, 2022)

Murali Annavaram — Wed, 12 Oct 2022 18:15:01 +0000

With deep sadness and great sorrow, we inform the community about the recent passing of our dear friend and colleague, Professor Michel Dubois, a foundational leader in our field of computer architecture and parallel processing.

Michel earned his Bachelor’s degree in Electrical Engineering and Nuclear Engineering from the Faculte Polytechnique de Mons (Belgium) in 1976, his Master’s degree in Electrical Engineering from the University of Minnesota in 1978, and his Ph.D. degree in Electrical Engineering in 1982 under the mentorship of Professor Fayé Briggs. He was a Research Engineer in the Central Research Laboratory of Thomson-CSF in Orsay, France, from March 1982 until August 1984 before joining the faculty of the Viterbi School of Engineering at the University of Southern California (USC), where he rose through the ranks as a tenured full professor, retiring in early 2022 as Professor Emeritus of Electrical and Computer Engineering.

As a promising and accomplished computer architect, even at the beginning of his career in academia, Michel’s early contributions in multiprocessor memory systems were to understand the need for providing software programmers effective shared memory coherence and relaxed memory consistency models that enable increased efficiency in maintaining memory accesses in multiprocessor computing systems, which have proved to be seminal contributions to our field. Among his many well-known works are his insights and contributions to weak consistency, process or release consistency, lazy release consistency, and other related relaxed or delayed consistency models for efficiently supporting synchronization, coherence, and ordering of events/accesses in shared memory multiprocessors. He also contributed to new approaches for verifying multiprocessor cache coherence protocols, detecting and eliminating useless misses in shared memory multiprocessors, and designing virtual-address caches.

Furthering his work in high-performance parallel processing and shared-memory systems, his research on cache and memory system design—both in single processors and in multiprocessors—has resulted in over 150 peer-reviewed publications, three book chapters, and three books, and various open-source design tools such as SlackSim and Parma which have been used in industry. One of his books is the text titled Parallel Computer Organization and Design, co-authored with Per Stenström (Chalmers University) and Murali Annavaram (USC), which has seen wide adoption and use. Michel’s Rapid Prototyping engines for Multiprocessors (RPM-1 and RPM-2), produced as exploratory research artifacts from large-scale funding supported by the U.S. National Science Foundation (NSF), pioneered the effective computation of programs across the design space of multiprocessor configurations, through emulation. This project aimed to demonstrate a new cost-effective testbed methodology to validate novel computer architecture designs with low-cost hardware prototyping using FPGAs. The legacy of his then-revolutionary approach to computer architecture and parallel processor design is evident in various commercialized multiprocessors.

Michel’s scholarly contributions have been recognized by his receiving a U.S. patent on “Apparatus for Maintaining Consistency in a Multiprocessor Computer System using Virtual Caching” as well as by being elevated to IEEE Fellow in 1999 and ACM Fellow in 2005 “for his contributions to the design of high-performance multiprocessor systems” and “for his contributions to multiprocessor memory system design,” respectively. In essence, every time we access memory and execute code that runs on more than one processor, we are relying on foundations built by Michel and others whose works he in some way inspired, influenced, and/or impacted.

Beyond his many significant scholarly contributions, Michel contributed significantly to professional leadership and service to our computer architecture community. He served as Program Chair of HPCA 1996 and ISCA 2001, as General Chair of ISCA’04 and General Co-Chair of HiPEAC’07, and as Area Editor and Guest Editor for the Journal of Parallel and Distributed Computing (JPDC) and IEEE Transactions on Computers (TC), to name a few of many. He also served honorably on the IEEE Computer Society’s TCCA Executive Committee for ten years, from 2006-2015. During his 37 years at USC, Michel mentored over a dozen Ph.D. students through graduation—many of whom are now leaders in the industry. He also served administratively as Director of the Computer Engineering Division in the Ming Hsieh Department of Electrical and Computer Engineering at USC Viterbi.

Michel is survived by his dear wife, Lorraine Dubois, as well as his cousins in Belgium and extended family members in the U.S., including Lorraine’s mother, brother, and sisters, and Lorraine’s two daughters, Jenna and Melanie.

In Michel’s honor, a memorial symposium celebrating his life and scholarly contributions will be held on Saturday, October 29th, at USC in Los Angeles, California. At this symposium, many short talks by computer architecture experts from around the world and his former students and USC colleagues will occur. All are invited and welcome to attend; please visit the memorial website for additional information on the speaker list and schedule of events. While on that site, please also RSVP for the event using the following RSVP link so we may reach out to you with timely updates. For those in the broader computer architecture community who cannot attend in person but wish to participate remotely, we encourage you to join the symposium virtually on Zoom. We also invite anyone who wishes to contribute a vignette about Michel by sharing a short 2- to 3-minute video clip; the zoom links for virtual attendance and the video clip upload function will be available on the memorial website shortly and emailed to those who RSVPed for the event.

Respectfully,

Murali Annavaram, Per Stenström, and Timothy Pinkston.

SIGARCH/TCCA Best Practices & Resources

Martha Kim and Moin Qureshi — Fri, 27 May 2022 16:23:27 +0000

Scientific progress relies on sound reviewing processes. The integrity of our review process is paramount, and it rests on our handling of ethical concerns and conflicts of interest. These issues have been compounded by the rapid growth in the computer architecture community, increases in reviewer load, and the introduction of virtual and hybrid conferences. To support our conference organizers, the SIGARCH and TCCA Executive Committees have developed and updated several written guidelines and best practices documents. These resources cover a range of strategic and tactical concerns for conference organizers, from committee formation and review assignments, to how to turn on logging and disable batch downloads in HotCRP. This blog post provides a guided tour of these resources.

There are several parties involved in the conference reviewing process, including the authors, the reviewers, and the program chairs. In the past, defining the responsibilities of various parties was a bit ad-hoc, varying significantly between conferences. The role of the program chair is especially difficult, as the person(s) assigned for the role may be doing this task for the first time, and may not be fully aware of the trade-off involved in various options for the several decisions associated with the reviewing process. Therefore, to ensure robustness, fairness, and consistency, SIGARCH and TCCA have jointly developed two documents. The first document, SIGARCH/TCCA’s Recommended Best Practices for Conference Reviewing Process. This document outlines the responsibilities of authors, reviewers, and PC chairs with respect to critical ethical issues, including conflicts of interest. A second document, SIGARCH/TCCA’s Recommended Best Practices for ISCA Program Chairs outlines various elements of the review process and decisions that specifically fall to the Program Chair. As with the earlier best practices, SIGARCH and TCCA encourage reviewers and PC chairs for any SIGARCH- or TCCA-sponsored conference, not just ISCA, to abide by the best practices outlined in these documents.

Conference organizers can also find the benefits and responsibilities of SIGARCH-affiliation in the Guidelines for SIGARCH Conferences. Sponsored conferences have the financial backing of the SIG and consequently have additional expectations relative to “in cooperation” conferences where the affiliation is looser. Similarly, IEEE CS offers financial sponsorship or technical sponsorship to conferences. In the former, an IEEE CS Technical Committee is financially involved and there are certain expectations on the conference. In the latter, there is no IEEE CS financial liability but still significant technical cooperation. For more details, see IEEE Policies, pages 76 and 77.

In conjunction with these policy guides, the SIGARCH and TCCA Executive Committees have released several public resources containing practical information for the community. These are living documents that will be updated regularly, to which the community is invited to contribute suggestions. The SIGARCH/TCCA’s Resource Packet for General Chairs aggregates advice and tactical information for general chairs. It also contains advice on critical tasks such as forming and vetting the conference organizing committee. The companion SIGARCH/TCCA’s Resource Packet for PC Chairs compiles tactical information related to that role, such as HotCRP configuration advice and submission deadline selection. Also useful for PC Chairs when forming the review committee, the Architecture PC Database (PCDB) is available online. This database aggregates publicly available program committee information for ISCA, MICRO, ASPLOS, HPCA, and IEEE Micro TopPicks dating back to 2014. We encourage our community to use this database to make our events more inclusive, distribute the review load, and broaden participation in the process.

About the Authors:

Martha Kim is an Associate Professor of Computer Science at Columbia University and a member of the SIGARCH EC.

Moinuddin Qureshi is a Professor of Computer Science at Georgia Tech, and an Executive Committee member of IEEE TCCA.

HPCA 2022 Trip Report

Elaheh Sadredini and Daniel Wong — Mon, 18 Apr 2022 09:00:00 +0000

This is a repost from the ACM SigArch blog.

Welcome to the trip report on the 28th IEEE International Symposium on High-Performance Computer Architecture (HPCA-28)! This marks the second fully-virtual HPCA (and hopefully last). With signs of the COVID-19 pandemic receding last November 2021, HPCA decided to move to April 2022 with the goal of an in-person conference. Unfortunately, the only thing predictable about COVID-19 is its unpredictability. For the second time in a row, General Chairs Jung Ho Ahn and John Kim worked tirelessly to organize another successful fully-virtual HPCA. HPCA-28 was hosted on Whova and Gather Town, drawing in more than 1,255 attendees from 40 countries.

This year’s Program Chair, Stefanos Kaxiras, put together an exciting program. HPCA-28 received 273 submissions with 80 papers accepted (29% acceptance rate) and is the largest HPCA ever! In addition, this year’s HPCA was the first to adopt artifact evaluation, following the trend from prior computer architecture and systems conferences to improve reproducibility in the community. This year’s program consisted of 3 keynotes (joint with PPoPP and CGO), 2 tutorials, 2 workshops, 24 paper sessions, and an awards ceremony.

Main Program

In total, the main program contained 80 papers for the main track, 5 papers in the Industrial session, and 3 presentations in Best of CAL. This was organized into 24 paper sessions across 3 parallel tracks. Roughly a third of the program’s papers focused on accelerators, another third on traditional topics (microarch, caches, at scale computing, NoCs, simulation, etc.), and the remaining on quantum, security, and memory.

Following a much-needed trend in other systems and computer architecture conferences, this marks the first time HPCA carried out Artifact Evaluation (AE) led by Alberto Ros and Tushar Krishna. 41.8% of submitted papers indicated an interest in AE and 16.25% of accepted papers submitted to AE, with almost all papers receiving all three badges. We hope to see more members of the community participate in artifact evaluation in the future!

Best Paper Session

For the Best Paper Award, four papers were nominated. Vignesh Balaji, a research scientist at NVIDIA, presented “Improving Locality of Irregular Updates with Hardware Assisted Propagation Blocking“. The second talk “Effective Mimicry of Belady’s MIN Policy” was brought up by Ishan Shah, an undergraduate student from the University of Texas at Austin. Zhi-Gang Liu, a principal research engineer at ARM, presented his work “S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration” in collaboration with the University of Rochester. The final and fourth paper, “SupermarQ: A Scalable Quantum Benchmark Suite”, was presented by Teague Tomesh, a fourth-year Ph.D. candidate at Princeton University.

Keynotes

Carrying the tradition of past HPCA/PPoPP/CGO offerings, there were three provoking inter-disciplinary keynotes organized by each conference.

The first keynote (PPoPP) was by James Reinders, an engineer at Intel who focused on enabling parallel programming for heterogeneous computing systems. He highlighted the challenges of increasingly diverse heterogeneous systems and the difficulty in programming them effectively. He then highlighted current efforts to reign in the complexity of programming heterogeneous systems, such as the oneAPI initiative to standardize the programming of accelerated processing units (XPUs) and the SYCL Data Parallel Language. To fully meet the programmability challenges of future heterogeneous systems, James made a call to the community to come together and create an open, multi-vendor, multi-architecture, multi-language future to tackle the many challenges for effective programming of heterogeneous systems.

The second keynote (CGO) was by Saman Amarasinghe, a Professor in EECS at MIT and a member of CSAIL. Over the past 30 years, there have been unprecedented advances in algorithms, systems, and program structure…however, compilers are largely still structurally the same. Saman presented a vision for Compiler2.0, in order to inspire the compiler community to radically rethink how to build next-generation compilers and bring compiler technology to the 21st century. He then presented several possible examples of how to automate compiler construction, build Compilers as a Service, and use machine learning for cost modeling and program representation.

The third keynote (HPCA) was by Babak Falsafi, a Professor and the founding director of EcoCloud at EPFL. He first presented an overview of the modern server blade and highlighted how many aspects of modern servers are primarily derived from CPU-centric desktop PCs of the 80s. He then presented a vision for a clean-slate approach to designing servers based on three pillars—Integration, Specialization, and Approximation—to enable scalable servers in the post-Moore era.

Workshops and Tutorials

On the first weekend of April, there was a workshop and a tutorial each day. Our old friends, DOSSA and CogArch, showed up as in previous years. This year’s topic of DOSSA was “HW/SW Components for Domain Specific Systems”, while CogArch focused on the security and data-privacy preserving aspects of AI/ML and related application domains. The first tutorial came from IBM and provided detailed guidance on using the public cloud for HPC workloads. The other tutorial hoped to deliver a good sense of ML benchmarking through talks by industry and academic experts.

Awards

The award ceremony was held on Wednesday. The program chair Stefanos Kaxiras from Uppsala University opened the event with a summary of the technical program. Then Antonio Gonzalez from Universitat Politècnica de Catalunya announced the 10 members of the HPCA Hall of Fame for 2022, which are Lieven Eeckhout (Ghent University), Tushar Krishna (Georgia Institute of Technology), Hsien-Hsin S. Lee (Meta), Mike O’Connor (NVIDIA), Daniel J. Sorin (Duke University), G. Edward Suh (Cornell University / Meta), Guangyu Sun (Peking University), Carole-Jean Wu (Meta), Lixin Zhang (Freelance), Huiyang Zhou (North Carolina State University). Congratulations!

The Distinguished Artifact Awards are presented to two groups – Ji Liu, Peiyi Li, Huiyang Zhou for “Not All SWAPs Have the Same Cost: A Case for Optimization-Aware Qubit routing“, and Zhaoying Li, Dan Wu, Dhananjaya Wijerathne, Tulika Mitra for “LISA: Graph Neural Network Based Portable Mapping on Spatial Accelerators”.

Finally, after the review of the Best Paper Committee, the Best Paper Award went to “SupermarQ: A Scalable Quantum Benchmark Suite” by Teague Tomesh, Pranav Gokhale, Victory Omole, Gokul Subramanian Ravi, Kaitlin N. Smith, Joshua Viszlai, Xin-Chuan Wu, Nikos Hardavellas, Margaret Martonosi, Frederic T. Chong.

Business Meeting

The business meeting covered updates and statistics from IEEE CS TCCA, CASA, and the organizing committee. This year IEEE CS TCCA started a Share Your Resume website where graduate students and postdocs can upload CVs to publicize availability for part-time and full-time jobs. This is available at http://ieeetcca.org/share-your-resume/.

Elba Garza and Emily Ruppel also presented updates from CASA, highlighting many upcoming events, such as the DEI Summer Reading Group (Summer 2022), Academic Mental Health Workshop (Fall 2022), and Jobs Workshop (Fall 2022). A newer initiative that CASA is building is CALM, a long-term mentoring program for computer architecture. After a successful pilot program, CALM will be scaling up this summer. If you are interested in participating as a mentor or mentee, visit http://comparchmentoring.org/.

Finally, the locations for the next HPCA/PPoPP/CGO/CC will be Montreal in 2023 and Edinburgh in 2024, with both planning for an in-person event.

Acknowledgment

We would like to thank our amazing Ph.D. students, Jingyao Zhang and Nafis Mustakin, for helping us cover all the exciting events happening at HPCA 2022. Both of them attended various sessions and helped us write the article.

About the Authors:

Elaheh Sadredini is an Assistant Professor in the Computer Science and Engineering Department at the University of California, Riverside. Her research interests lie at the intersection of computer architecture and security.

Daniel Wong is an Assistant Professor in the Electrical and Computer Engineering Department at the University of California, Riverside. His research interests include energy-efficient computing, data center architectures, and GPGPUs.

How did the road to new adventure in Superconductor Neural Network Accelerator happen?

Koji Inoue, Jangwoo Kim and Masamitsu Tanaka — Wed, 03 Nov 2021 02:23:55 +0000

1: What is Superconductor SFQ?

Moore’s Law, doubling the number of transistors in a chip every two years, has so far contributed to the evolution of computer systems. Unfortunately, we cannot expect sustainable transistor shrinking anymore, marking the beginning of the so-called post-Moore era. Therefore, it has become essential to explore emerging devices, and superconductor single-flux-quantum (SFQ) logic that operates in a 4.2-kelvin environment is a promising candidate. As shown in Figure 1, Josephson junctions (JJs) are used as switching elements in SFQ logic to compose a superconductor ring (SFQ ring) that can store (or trap) and transfer a single magnetic flux quantum. It fundamentally operates with the voltage pulse-driven nature that makes it possible to achieve extremely low-latency (~10⁻¹² s) and low-energy (~10⁻¹⁹ J) JJ switching.

2: History of our SFQ Research

Although several researchers demonstrated over 100 GHz ultra-high-speed SFQ designs successfully, unfortunately, the primary purpose was to prove SFQ circuits’ potential [SFQ BS]. From the viewpoint of computer architecture, we found several issues in such traditional designs: (1) their bit-serial nature to localize the circuits and reduce wire length causes a critical performance issue if we target multi-bit operations that are usually required, (2) the unique characteristic of SFQ logic, i.e., each SFQ logic gate has a latch function inherent in the SFQ ring feature, has not been considered, and (3) there was no discussion regarding effective performance with the consideration of memory impacts. These were the starting points of SFQ research from the viewpoint of computer architecture. At the beginning of 2000, SFQ was not so familiar in our community, so my friends joked, “Hey Koji, you should chill your head before dipping your chip in liquid helium at 4 kelvins!” Based on deep cross-layer discussions, we have decided to go in the direction of “bit-parallel gate-level pipelining (BP-GLP)” to solve issues (1) and (2). There was a claim that it is impossible to maintain the ultra-fast operations if we apply the bit-parallel scheme because the timing constraints become severe due to complex, long wires. However, in 2017, we successfully demonstrated our first BP-GLP ALU design that operates at over 50 GHz with 1.6 mW (see a demo video [SFQ Youtube]). To challenge the issue of (3), we started the current international collaboration of Kyushu University (KU), Seoul National University (SNU), and Nagoya University (NU). We targeted neural network acceleration because its stream-style computing is suitable for BP-GLP. Then, finally, we have proposed SuperNPU [MICRO][TopPicks]. This collaboration worked very well with two key Ph.D. students, Koki Ishida from KU and Ilkwon Byun from SNU, who drove this project. For the kick-off, NU and KU members visited SNU in 2019, and Koki stayed at SNU for three months to accelerate our collaborative work.

3: SuperNPU was born in cross-layer computer architecture research

SuperNPU, as shown in Figure 2 (a), is our design for an SFQ-based neural processing unit (NPU) [MICRO]. The key was to achieve cross-layer interaction and optimization to define and explore architectural design space efficiently and practically. NU and KU have accumulated experience in the design and prototyping of BP-GLP, as shown in Figure 3. All chips in this figure were fabricated with 1.0 µm process technology, and correct operations were obtained in measurement in a 4-kelvin environment. Based on such actual designs, we have extracted device characteristics. Then, SNU and KU developed a simulation framework presented in Figure 2 (b) and performed architectural exploration and optimization. We believe that our team is the first (and best) to explore the SFQ technology for cross-layer computer architecture research.

4: What have we learned from SuperNPU?

Through the research of SuperNPU, we have learned a lot. First, bridging the device/circuit level consideration and architecture level optimization is essential to explore emerging device computing such as SFQ. In particular, fabricating and measuring real chips are important for building accurate power/performance/area models. Because it is a new device, some things cannot be understood until it is manufactured, e.g., the impact of wiring on large-scale SFQ circuits with over 50 GHz ultra-high-speed operations, effects of process variation, etc. Second, new device features impact many tradeoffs in computer systems, so revisiting microarchitecture is a critical challenge. And last but not least, we should accelerate wild/crazy challenges that, of course, have a lot of risks but are so exciting. We chilled our heads (but not 4 kelvins) and decided to go in this direction because it is exactly promising! The SFQ process/fabrication technology is still immature due to the lack of investments, e.g., the current advanced feature size available for us is 1.0 µm, which is several generations older than CMOS. With significant advances in device technology, we can expect scaling merit, and it would open a new door for extremely high-speed, power-efficient computing. It is a vital role of computer architects to exploit the potential of such emerging devices fully.

[SFQ BS] https://doi.org/10.1587/transele.E97.C.157

[SFQ Youtube] https://www.youtube.com/watch?v=jZP7sXWHyZs

[MICRO] https://ieeexplore.ieee.org/document/9251979

[TopPicks] https://ieeexplore.ieee.org/document/9395193

About the authors:

Koji Inoue:

He is a professor in the department of advanced information technology, and the director of the system LSI center, at Kyushu University, Japan. His research interests include power-aware computing, IoT system designs, supercomputing, and emerging device computing.

Jangwoo Kim:

He is a professor in the department of electrical and computer engineering at Seoul National University, Korea. His research interests include server and datacenter architectures, cryogenic computing, and system modeling methodologies.

Masamitsu Tanaka:

He is an assistant professor in the department of electronics at Nagoya University, Japan. His research interests include subterahertz-clock-frequency LSI design methodologies and classical and quantum computing using superconductor-based cryogenic electronics.

Happy Birthday, CASA!: A Retrospective

Elba Garza — Fri, 15 Oct 2021 15:19:22 +0000

With MICRO upon us, I would like to take time to reflect on the events running up to the establishment of CASA or the Computer Architecture Student Association. This month we will be celebrating our first anniversary as an active organization!

We at CASA, feel we have made significant progress towards our goal of bringing students in computer architecture together and providing a strong support system for all. We are thankful for the support CASA has received from the community at large; you have helped us fund and run a series of successful initiatives and events throughout the past year. Before we look back on our establishment and highlight our activities of the past year, we would like to thank all of you who contributed to them, whether you were an organizer, a participant, or a sponsor. You have helped make CASA what it is.

CASA’s establishment did not occur overnight—it was the result of almost a year of planning, community outreach, discussions, and exploration among a group of dedicated students and supporters. For me, it all originally began as a fleeting idea while attending ISCA 2019. Initially, I imagined an affinity group focusing on students from historically marginalized or disadvantaged (e.g. first-generation, BIPOC, LGBTQ+, and disabled) backgrounds. However, with the murmurs of a great tragedy in the community and a later chance meeting with future co-founder Raghavendra (Raghav) Pothukuchi at MICRO 2019, the concept of the group evolved toward one for any and all students in computer architecture who seek a support system.

For Raghav and me, CASA was a way to address the mental health crisis that has been silently growing in our student community. As we worked on this issue, students came forward with accounts of how they or someone they knew faced distressing experiences during their graduate studies. Of course, this problem extends beyond our community—in our research, we came across a study in Nature Biotechnology, which surveyed over 2000 graduate students from around 200 institutions and found that graduate students are “more than six times as likely to experience depression and anxiety as compared to the general population.” We want to make a change, and don’t want to leave any student behind.

By May 2020, Raghav and I wrote and published our proposal through IEEE TCCA, outlining what we felt were proper steps toward supporting students in our community. This proposal acknowledged the precarious position graduate students hold relative to others in academia, the elusive work-life balance in their lives, and worse, how negative graduate student stereotypes, normalized unhealthy behaviors, and stigma on mental health often remain barriers to seeking help. In June 2020, we published an article on Computer Architecture Today advertising the proposal and began to actively recruit students for the creation of the proposed student group.

With a core group of student volunteers and faculty support gathered, the summer of 2020 brought a flurry of online meetings, brainstorming sessions, and long Slack discussions. The final result was the announcement of the formation of the Computer Architecture Student Association at MICRO 2020. With a dedicated student Slack space and an active social media presence, CASA has been helping keep students in computer architecture connected and informed. I sincerely feel CASA has kept to its goals of supporting students and making our community more approachable.

The above timeline figure outlines the events and initiatives run by CASA over the past year. The following is a highlight and write-up of select events.

MaSS/MaSA: To help junior architecture students get to know other students and get important advice, CASA teamed up with Prof. Joel Emer to extend the highly successful Meet-a-Senior-Architect (MaSA) program and launch the Meet-A-Senior-Student (MaSS) program at MICRO 2020. In the first iteration of MaSS, 84 junior students (undergraduate, masters, and 1st/2nd year Ph.D.) joined as mentees and 55 senior students (3rd+ year Ph.D.) signed up to become mentors. At ASPLOS 2021, 88 junior students joined as mentees and 54 senior students joined as mentors. Furthermore, CASA helped extend the MaSA program beyond ISCA and into ASPLOS 2021. We will again be helping run MaSA at MICRO 2021.

ArchChat Social Hour: To help the community stay connected amid the ongoing global pandemic, we introduced a recurring online social event called the ArchChat Social Hour. Established with help from PLTea organizers, we ran it as a monthly event throughout the first half of 2021. During ArchChat, attendees met and chatted with colleagues in randomized Zoom breakout rooms. We hope to continue it in the future on a quarterly or semesterly basis now with institutions being back in person.

A screenshot of participants at the first ArchChat Social Hour.

Mental Health & Ph.D. Studies Workshop: As outlined earlier, the mental health crisis among graduate students is a very real and serious issue. To bring attention to these issues, our initial proposal called for programming to help fellow students share, listen and be heard. Thus, with generous support from both ACM SIGARCH and IEEE TCCA, CASA announced and held a two-part event on the intersection of Ph.D. studies & mental health in March 2021. Presenters from Ph.D. Balance shared advice on how to address difficult academic situations, created a space for students to share academic experiences, and provided useful tips on navigating graduate school. While hosting a single event by no means solves the problems faced by students in our community, we plan to continue holding programming that helps fellow students better navigate the trials and tribulations brought on by academic studies.

Summer DEI Reading Group: Events in the computing community and society at large spurred CASA steering committee member, Udit Gupta to organize a summer-long reading series focused on advancing diversity, equity, and inclusion (DEI) in computer architecture. Students and faculty members read relevant works and participated in discussions, sharing personal experiences and bringing awareness to diverse issues faced by many in our community. We are currently forming a repository with the reading materials used and discussion guides for others to use and run their own groups.

Mentoring Publications & Future Initiatives: To further promote the development of successful mentoring initiatives in computer architecture, members of CASA explored current short-term mentoring programming (along with feedback from both MaSS and MaSA) and proposed developing long-term mentoring programming for computer architecture. We presented our resulting paper, Mentoring Opportunities in Computer Architecture: Analyzing the Past to Develop the Future, at the Workshop on Computer Architecture Education (WCAE 2021) at ISCA 2021.

In the paper, we highlight the importance of mentor-mentee relationships and emphasize the long time frame necessary to solidify these relationships. While the computer architecture community currently hosts many short-term mentoring opportunities, long-term mentoring programming is currently not readily available. We also demonstrate how mentorship relationships are particularly important for students from historically marginalized backgrounds. It is, therefore, crucial to forming long-term mentoring programs to retain these students and thus develop a more diverse computing body in the future.

As such, CASA is excited to announce our most recent initiative: the Computer Architecture Long-term Mentoring Program, or CALM. CALM is the result of the aforementioned research, analysis, and outreach. This initiative matches mentors with mentees from the greater computer architecture community for longer-term mentoring, e.g. one year. To learn more about the CALM Pilot Program, long-term mentoring, and participating, we will be hosting two CALM Kick-Off events during MICRO 2021.

I am beyond proud of what CASA and its members have achieved in the past year. CASA would be nowhere without the dedication and time graciously given by its steering committee and supporting faculty–I thank each and every one of you profusely. Co-founding CASA with Raghav has been an honor, and both my personal and professional lives are richer for it. I hope our student members feel the same!

CASA’s future looks bright with new initiatives and cohorts of members on the horizon. Recent generous financial support from the architecture community will ensure CASA’s mission continues, especially as we return to in-person conferences and programming. My long-term vision is that student-led initiatives and communities like CASA become the norm in all computing research communities. That we will all look back at the time when they did not exist with shock and gratitude for the present. For now, I am content with computer architecture leading the way in this vision.

Elba Garza, TAMU.

Message from the Inaugural IEEE SEED 2021 Program Chairs

Guru Prasadh Venkataramani and Yinqian Zhang — Tue, 14 Sep 2021 22:18:01 +0000

Dear Computer Architecture Community,

We take this opportunity to invite you all to the first-ever edition of the IEEE International Symposium on Secure and Private Execution Environment Design (SEED). This blog post will provide an overview of the inaugural edition of IEEE SEED 2021 and its program. The conference is sponsored by TCCA and will be held virtually. The technical program consists of a total of 26 papers accepted and two keynote talks.

There were four categories for submission: (1) regular papers with 11 pages, (2) Systemization of Knowledge (SoK) papers with 11 pages, (3) Seeds of SEED papers with 6 pages, or (4) Work in Progress (WiP) papers with 6 pages. The primary focus of the regular papers was to describe new research ideas overlapping computer architecture/systems and security, supported by experimental implementation and evaluation of the proposed research ideas. Systemization of Knowledge (SoK) papers would mainly evaluate, systematize and contextualize existing knowledge on computer system security research topics. The primary focus of “Seeds of SEED” was to describe the promising designs, initial development, and preliminary evaluation of new ideas critical to the security of future architecture/systems. Contributions from industry, that bring awareness to a new security problem and/or lay vision for sound architecture/systems security principles, were especially encouraged for Seeds of SEED papers. Work-in-Progress (WiP) papers describe novel secure systems designs supported by experiments. For papers accepted under the WiP category, the authors were informed that they would be given an opportunity to present at the conference and receive feedback and that only the title and a brief abstract will be displayed on the conference website. This was done to help these ideas mature into full-length papers in the future.

The SEED program committee reviewed a total of 33 submissions, along with 10 invited submissions from a broad spectrum of computer architecture and systems research communities. The review committee consisted of 28 members. Paper review assignments were made by carefully taking the conflict of interest into account. Each regular submission received at least 3 reviews. Additional reviews were solicited for papers that did not have consistent scoring across the reviewers. There was also extensive post-review online discussion among the reviewers to seek all of their opinions. Final decisions for each paper were made through unanimous consent among all of those who had reviewed the submission. If a subset of reviews pointed to major concerns that cannot be addressed before the conference but the submission had shown a good potential overall, we sought the authors’ feedback on whether they were willing to present the work as a WiP paper. In the end, 13 papers were accepted as the regular full-length papers, along with 10 Seeds of SEED and 3 WiP papers. The program captures many of the recent trends in computer systems and architecture security, while also representing some of the more classical themes. Privacy-enhancing computing has been an emerging topic to capture the community’s attention, and we will have a roundtable session on this topic. The program also includes papers on many recent topics that address side channels, memory systems, and safety. The first keynote by Prof. Milos Prvulovic of Georgia Tech will reflect upon the analog side channels that are far less understood and present greater dangers in the future. The second keynote by Prof. Ahmad-Reza Sadeghi of TU Darmstadt will discuss the future of the Trusted Execution Environment and their requirement to address the ever-changing security landscape.

We take this opportunity to thank everyone who helped us put together the SEED’21 technical program. We thank all the authors for their submissions to SEED. We could not have put together such a strong program without the hard work of the program committee. They were generous with their time by providing detailed and insightful reviews and actively participated in the subsequent online discussions. We thank the IEEE Technical Committee on Computer Architecture (TCCA) Executive committee for supporting our venture to inaugurate the SEED conference. We express our sincere thanks to Prof. Jakub Szefer and Prof. Yan Solihin, the General Chairs, for their meticulous efforts to organize the first edition of the SEED conference and all of their help with its organization. We express our special thanks to Dr. Wenjie Xiong, the publications chair for compiling the proceedings. Finally, we would like to thank all the attendees and offer our warm welcome to this inaugural edition of the SEED conference. We sincerely hope that you will enjoy, learn, and benefit from the program that we have put together.

Guru Prasadh Venkataramani, George Washington University
Yinqian Zhang, SUSTech
Inaugural SEED 2021 Program Chairs

Advancing and Promoting DEI in Computer Architecture–Summer 2021 Reading Group

Elba Garza — Sat, 05 Jun 2021 22:27:31 +0000

Across the computer systems and architecture community, there has been A Call to Action to advance and promote diversity, equity, and inclusion (DEI) values through systemic change. Towards this step, HPCA 2021, PPoPP 2021, CGO 2021, and CC 2021 held a joint session panel on “Valuing Diversity, Equity, and Inclusion in Our Computing Community” to discuss paths across academia and industry. Recent blog posts have raised awareness and advocated for greater DEI efforts including: Gender Diversity in Computer Architecture, Statement on Diversity at MICRO-50, What Happens to Us Does Not Happen to Most of You, Inclusion and Conference Governance, and Chilly Climate in Computer Architecture?. Building an inclusive and safe research community requires collective and continued efforts from all members.

This summer, the Computer Architecture Student Association (CASA) is organizing the “Advancing and Promoting DEI in Computer Architecture Summer Reading Group”. The goals for the reading group are (1) provide a venue for computer architecture researchers and practitioners to study and discuss challenges and paths for broadening participation in computer architecture, (2) encourage a data-driven approach to understand DEI in computer architecture by reading and analyzing scholarly work, (3), create a diverse community where everyone has the opportunity to share their experiences and perspectives, and (4) when possible, invite guest speakers to shed light on their work related to DEI in STEM and computing.

Who can join? The summer reading group is open to all: students, post-docs, faculty, and industry practitioners. In addition to computer architecture, we welcome anyone in computer systems and computing more generally.

When is the reading group? The summer reading group will be held over six-to-seven sessions this summer 2021. To enable attendance across different schedules and time zones, the reading group sessions will be held at different times of the week throughout the summer. Sessions will be hosted roughly every two to three weeks. We have provided the tentative schedule below.

How is each session structured? Each session will be around 1 hour long. Central to this reading group is enabling anyone interested in joining the reading group to have the opportunity to do so. As such, each session will be split into two components, and no pre-meeting readings or literature sourcing is required. During the first 20 minutes of the session we will read the paper, book chapter, or article together (following copyright and fair use guidelines). In the latter 40 minutes of the session the members will discuss the presented material in groups of no more than 25. Breakout rooms will be used if the attendance is larger than 25.

We hope splitting the session into two distinct components will encourage individuals that may not otherwise be able to read the material beforehand to participate as well. Individuals that have already read the material on their own may join the second half directly. CASA members and volunteers will facilitate discussions for each session.

Tentative schedule:

Date	Time (ET)	Reading
June 10	12pm – 1pm ET(Registration Link)	A Rising Tide of Hate and Violence against Asian Americans in New York During COVID-19: Impact, Causes, Solutions (Foreword) by The Asian American Bar Association of New York Optional: StoryCorps: Gary Koivu on His Best Friend, Vincent Chin; A Silent Technical Advantage by Philip Guo; I Am Not Your Asian Stereotype by Canwen Xu
June 21	1pm – 2pm ET(Registration Link)	Stuck in the Shallow End by Jane Margolis
July 9	2pm – 3pm ET(Registration Link)	The Stereotypical Computer Scientist: Gendered Media Representations as a Barrier to Inclusion for Women by Cheryan et al. What Happens to Us Does Not Happen to Most of You by Kathryn S. McKinley
July 19	TBA	TBA
August 2	TBA	TBA
August 16	TBA	TBA

How to participate? There are many ways to participate in the summer reading group. First, everyone is more than welcome to join each announced session via the registration links above. Next, as you can see from the schedule above, readings for the last three sessions have not yet been finalized.

For the yet-to-be-announced sessions later in the summer, sign up for notifications and information via this Google Forms registration link. In addition, if you have a topic, book chapter, article, or other scholarly work you would like to suggest, please reach out to CASA (info@comparchsa.org). We hope to encourage discussions across a diverse range of experiences in computing. Finally, if you would like to volunteer to help guide discussions during the sessions please also reach out to CASA. We are looking forward to working with the community on organizing this summer reading group!

Hope to see you at the first summer reading group on June 10th!

About the Authors: The Computer Architecture Student Association (CASA) is an independent student-run organization with the express purpose of developing and fostering a positive and inviting student community within computer architecture.