Tutorials

The following tutorials will be held during the conference:

Tutorial 1 (May 7, full-day): The ∆QSD Paradigm: Designing Systems with Predictable Performance at High Load
Tutorial 2 (May 7, half-day, morning): Software Performance Analysis – Industry Perspectives
Tutorial 3 (May 7, half-day, afternoon): DTraComp: Distributed Trace Compare

The ∆QSD Paradigm: Designing Systems with Predictable Performance at High Load [PDF]

Authors: Peter Van Roy (Université catholique deLouvain, Belgium) and Seyed Hossein Haeri (University of Bergen, Norway)

Abstract

The ∆Q Systems Development paradigm (∆QSD) is a novel industrially-derived systems development methodology for developing complex real-world distributed systems, that directly embeds statistically-based performance metrics from the outset of the system design process and throughout the entire software production life cycle. It uses a stochastic approach to specify system behaviour, using cumulative distribution functions to model both delay and failure. Experience shows that this is a ‘sweet spot’ that gives good results with respect to the amount of computation needed. Predictions are accurate when the system model correctly models both independent and dependent parts. This paradigm has been developed by the company PNSol over a period of 20+ years, in collaboration with IOG (formerly IOHK), BT, Vodafone, Boeing Space and Defence, and other major companies who focus on the development of reliable high-quality, high integrity, distributed software systems, with strong real-time requirements. In particular it:

is outcome-centric and especially concerns itself with the timeliness (and probability of success) of an activity of interest.
works for validating initial goals and through to in-life service assurance.
permits top-down and bottom-up (or a mixture) design approaches.
can formulate both system-centric and user-centric “experience” questions.

∆QSD primarily targets systems with many independent users where real-time performance is important. This includes systems with large flows of mostly independent data items and systems that are subject to frequence overload situations. This includes the following situations:

service assurance and strategic planning of national broadband deployments.
validating potential effectiveness of safety-of-life distributed systems in adversarial environments.
design, development, and deployment of the largest proof-of-stake based ledger technology where assuring timeliness is a key security property.

Preparation

There will be lab sessions in the ΔQSD tutorial. We plan to have a 30-minute lab session for the first three sessions. Participants will use their browser and laptop to connect remotely to a Jupyter server running our ΔQSD tool. For this to work, the participant should have ssh installed on their laptop (if there are a few that don’t have this, we will help them do it during the first lab session). Each participant will be given a username and password to log in to the server.

Short Bio

Peter Van Roy (peter.vanroy@uclouvain.be) is professor in the ICTEAM Institute at the Université catholique de Louvain (UCLouvain), where he heads the Programming Languages and Distributed Computing Research Group. He coordinated the EU projects SELFMAN and LIGHTKONE and was a partner in the projects EMJD-DC, SYNCFREE, MANCOOSI, EVERGROW, and PEPITO. He is a developer of the Mozart Programming System and author of a well-known textbook on computer programming published by MIT Press. P. Van Roy is developing the ∆QSD tutorial as the main instrument for the dissemination of the ∆QSD paradigm. He has given four versions of this tutorial at international conferences since 2022, namely at DisCoTec 2022, HiPEAC 2022, EuroPar 2022, and HiPEAC 2023. He also gives lectures on ∆QSD in his master-level distributed systems course LINFO2345 at UCLouvain (videos on Youtube channel @PeterVanRoy).

Seyed Hossein Haeri (hossein.haeri@gmail.com) is an associate professor in the BLDL institute at the University of Bergen, Norway. Hossein is also employed by IOG, Belgium and Entropy Software Foundation, US. As a theoretical computer scientist, he is developing the mathematical foundation of ∆QSD. His research is in the intersection between Programming Languages and Software Engineering. Over the past decade, his research has enjoyed a flavour of Distributed Systems on top. In the recent past, he has delivered a dozen of ∆QSD talks at different venues, including QAVS 2022, ICE 2023, and NWPT 2023.

Software Performance Analysis – Industry Perspectives [PDF]

Authors: Kingsum Chow (Zhejiang University, China), Chengdong Li (Optimatist Technology, China), Anil Rajput (Advanced Micro Devices, United States) and Xinyu Jiang (Zhejiang University, China)

Abstract

Over the years, software performance analysis has become integral to systems architecture, particularly with the proliferation of cores per CPU socket. This surge in core count, notably in CPUs by industry giants like Intel, AMD, and ARM, has been notable. While hardware advancements demonstrate tech industry innovation, they pose challenges for software performance analysis, increasing complexity due to handling larger CPU counts, extensive datasets, complex workflows, and large-scale systems. Benchmarking and testing remain crucial for performance analysis, often utilizing specific indicators alongside benchmarks or workloads. Nonetheless, engineers sometimes misinterpret performance data from collection tools due to inadequate understanding of their mechanisms. This tutorial aims to elucidate these mechanisms, highlighting common patterns and their impact through illustrative cases. It emphasizes the importance of comprehending performance data collection mechanisms to prevent misguided engineering decisions.

Short Bio

Kingsum Chow (kingsum.chow@gmail.com) is a professor at the School of Software Technology, Zhejiang University. He received his Ph.D. in Computer Science and Engineering at the University of Washington in 1996. Prior to joining Zhejiang University in 2023, Kingsum has been working as a chief scientist and senior principal engineer in the industry. He has extensive experience in software hardware co-optimization from thirty years of working at Intel and Alibaba. He delivered two QCon keynotes. He appeared four times in JavaOne keynotes. He has been issued 30 patents. He has delivered more than 100 technical presentations. He has collaborated with many industry groups, including groups at Alibaba, Amazon, AMD, Ampere, Appeal, Arm, BEA, ByteDance, Facebook, Google, IBM, Intel, Microsoft, Netflix, Oracle, Siebel, Sun, Tencent and Twitter. In his spare time, he volunteers to coach multiple robotics teams to bring the joy of learning Science, Technology, Engineering and Mathematics to the K-12 students in USA and China.

Chengdong Li (chengdongli@optimatist.com) is a performance engineer, with more than 13 years of industry experience. He led and built several large-scale performance tools with both Tencent and Alibaba. In addition to his expertise in performance engineering, he is also interested in debugging, programming languages, and software-hardware co-optimization. He is the founder and CEO of Optimatist Technology, helping customers improve their hardware resource utilization and optimize workload running performance.

Anil Rajput (anil_Rajput@yahoo.com) is an AMD Fellow, Software System Design, as core architect for datacenter and cloud with focus on performance, deployments, optimizations, and best practices. He received his certification in data analytics from Harvard Business Analytics Program in 2022 and his Master’s in Electrical and Computer Engineering from Portland State University in 1997. Currently, Anil’s focus areas are workloads characterization, platform evaluation, cloud deployments, on-prem datacenters as well as understanding and resolving large deployment issues at scale for critical customers. Earlier, he has been at Intel Corporation for more than 20 years, playing various roles in the Software and Services Group, leading platform design, managed runtime like Java and .Net, scripting languages and development of representative benchmarks as chair of Java committee at SPEC. He was key members of teams who architected and developed several benchmarks like SPECjbb2005, SPECjvm2008, SPECjEnterprise2010, SPECpower_ssj2008 etc. Anil is also guiding graduate students as mentor and also participates in local High School science fairs to encourage kids for STEM in Oregon, USA.

Xinyu Jiang (bernardjiang5@outlook.com) is a postgraduate student at Zhejiang University. His advisor is Kingsum Chow. His research interest focuses on system performance analysis and optimization.

DTraComp: Distributed Trace Compare [PDF]

Authors: Maryam Ekhlasi (Polytechnique Montreal, Canada) and Nasser Ezzati-Jivan (Brock University, Canada)

Abstract

Microservice architectures improve software development through the use of diverse programming languages and deployment models, the containment of failures to specific services, and the sped-up identification and resolution of issues in separate services. However, identifying the source of performance issues is difficult due to the numerous interacting service instances and the complexities introduced by parallelism. While end-toend tracing provides a way to follow execution paths and identify latency across services, it falls short in identifying specific root causes of performance lags between processes. Furthermore, the lack of a comparison feature in many performance analysis tools does not allow for a detailed understanding of performance variances between different sets of requests. We are going to present DTraComp (Distributed Trace Compare), an open-source tool designed to work with various microservice tracing standards and integrates with Eclipse Trace Compass™. DTraComp enhances the analysis of distributed systems by offering a powerful visual comparison of two sets of executions, including parallel nested spans, and delivers detailed system kernel information for each thread in every span. This depth of analysis helps in pinpointing specific causes of performance issues across distributed systems.

Preparation

Attendees will watch the presentation, but those wanting to practice can bring a Linux laptop, or use a VirtualBox. The presenter will use a projector to show steps and live demos.

Short Bio

Professor Nasser Ezzati is an esteemed faculty member at Brock University’s Department of Computer Science, known for his pioneering research across multiple domains of computing. His expertise spans Software Debugging and Monitoring, Performance Engineering, Software Tracing, Distributed Multicore Systems, Cloud Computing and Virtualization, and Streaming Data Analysis. Through his work, Professor Ezzati aims to solve complex computing problems, enhance software reliability, and improve system performance. His contributions to the field are marked by a deep commitment to innovation and excellence, positioning him as a leading figure in computer science research and education. Email address: nezzatijivan@brocku.ca

Maryam Ekhlasi is a Ph.D. candidate at Polytechnique MontrÅLeal, specializing in software performance with a focus on complex distributed systems. With over eight years of experience as a software designer and developer, including experience working at Ericsson, she has developed expertise in performance metrics and Linux kernel events. Her work involves collaborating with big high-tech companies to address performance issues and enhance cloud efficiency. Additionally, she is recognized as one of the presenters at the Ericsson Developers Conference, demonstrating a practical application of her research in diagnosing software performance. Email address: Maryam.ekhlasi@polymtl.ca