This document summarizes a lecture on file systems and performance. It discusses the read/write process for magnetic disks involving seek time, rotational latency, and transfer time. Typical numbers for these parameters in magnetic disks are provided. Flash/SSD memory is also discussed as an alternative storage technology with advantages like low latency, no moving parts, and high throughput but also drawbacks like limited endurance. The document introduces concepts from queueing theory that can help analyze the performance of I/O systems, like modeling request arrival and service times as probabilistic distributions. Key metrics like response time and throughput are discussed for evaluating I/O performance.
Linac Coherent Light Source (LCLS) Data Transfer Requirementsinside-BigData.com
In this deck from the Stanford HPC Conference, Les Cottrell from the SLAC National Accelerator Laboratory, at Stanford University presents: Linac Coherent Light Source (LCLS) Data Transfer Requirements.
"Funded by the U.S. Department of Energy (DOE) the LCLS is the world’s first hard X-ray free-electron laser. Its strobe-like pulses are just a few millionths of a billionth of a second long, and a billion times brighter than previous X-ray sources. Scientists use LCLS to take crisp pictures of atomic motions, watch chemical reactions unfold, probe the properties of materials and explore fundamental processes in living things.
Its performance to date, over the first few years of operation, has already provided a breathtaking array of world-leading results, published in the most prestigious academic journals and has inspired other XFEL facilities to be commissioned around the world.
LCLS-II will build from the success of LCLS to ensure that the U.S. maintains a world-leading capability for advanced research in chemistry, materials, biology and energy. It is planned to see first light in 2020.
LCLS-II will provide a major jump in capability – moving from 120 pulses per second to 1 million pulses per second. This will enable researchers to perform experiments in a wide range of fields that are now impossible. The unique capabilities of LCLS-II will yield a host of discoveries to advance technology, new energy solutions and our quality of life.
Analysis of the data will require transporting huge amounts of data from SLAC to supercomputers at other sites to provide near real-time analysis results and feedback to the experiments.
The talk will introduce LCLS and LCLS-II with a short video, discuss its data reduction, collection, data transfer needs and current progress in meeting these needs."
Watch the video: https://youtu.be/LkwwGh7YdPI
Learn more: https://www6.slac.stanford.edu/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
IBM and ASTRON, the Netherlands Institute for Radio Astronomy, will unveil a prototype high-density, 64-bit microserver CPU placed on a 133 x 55 mm board running Linux. The partners are building the microserver as part of the DOME project, which is tasked with building an IT roadmap for the Square Kilometer Array, an international consortium to build the world's largest and most sensitive radio telescope. Scientists estimate that the processing power required to operate the telescope will be equal to several millions of today's fastest computers.
IBM scientist Ronald Luijten (@ronaldgadget) will present the microserver in English from ASTRON's offices in Dwingeloo, The Netherlands.
This was recorded on 3 July 14:00 Central European Time
We all know how CPU hungry Ceph is. What if we could change our architecture using NVMeoF?
This talk explores a theoretical setup and was given originally at Ceph Day London 2019.
Linac Coherent Light Source (LCLS) Data Transfer Requirementsinside-BigData.com
In this deck from the Stanford HPC Conference, Les Cottrell from the SLAC National Accelerator Laboratory, at Stanford University presents: Linac Coherent Light Source (LCLS) Data Transfer Requirements.
"Funded by the U.S. Department of Energy (DOE) the LCLS is the world’s first hard X-ray free-electron laser. Its strobe-like pulses are just a few millionths of a billionth of a second long, and a billion times brighter than previous X-ray sources. Scientists use LCLS to take crisp pictures of atomic motions, watch chemical reactions unfold, probe the properties of materials and explore fundamental processes in living things.
Its performance to date, over the first few years of operation, has already provided a breathtaking array of world-leading results, published in the most prestigious academic journals and has inspired other XFEL facilities to be commissioned around the world.
LCLS-II will build from the success of LCLS to ensure that the U.S. maintains a world-leading capability for advanced research in chemistry, materials, biology and energy. It is planned to see first light in 2020.
LCLS-II will provide a major jump in capability – moving from 120 pulses per second to 1 million pulses per second. This will enable researchers to perform experiments in a wide range of fields that are now impossible. The unique capabilities of LCLS-II will yield a host of discoveries to advance technology, new energy solutions and our quality of life.
Analysis of the data will require transporting huge amounts of data from SLAC to supercomputers at other sites to provide near real-time analysis results and feedback to the experiments.
The talk will introduce LCLS and LCLS-II with a short video, discuss its data reduction, collection, data transfer needs and current progress in meeting these needs."
Watch the video: https://youtu.be/LkwwGh7YdPI
Learn more: https://www6.slac.stanford.edu/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
IBM and ASTRON, the Netherlands Institute for Radio Astronomy, will unveil a prototype high-density, 64-bit microserver CPU placed on a 133 x 55 mm board running Linux. The partners are building the microserver as part of the DOME project, which is tasked with building an IT roadmap for the Square Kilometer Array, an international consortium to build the world's largest and most sensitive radio telescope. Scientists estimate that the processing power required to operate the telescope will be equal to several millions of today's fastest computers.
IBM scientist Ronald Luijten (@ronaldgadget) will present the microserver in English from ASTRON's offices in Dwingeloo, The Netherlands.
This was recorded on 3 July 14:00 Central European Time
We all know how CPU hungry Ceph is. What if we could change our architecture using NVMeoF?
This talk explores a theoretical setup and was given originally at Ceph Day London 2019.
In this deck from ATPESC 2019, Ken Raffenetti from Argonne presents an overview of HPC interconnects.
"The Argonne Training Program on Extreme-Scale Computing (ATPESC) provides intensive, two-week training on the key skills, approaches, and tools to design, implement, and execute computational science and engineering applications on current high-end computing systems and the leadership-class computing systems of the future."
Watch the video: https://wp.me/p3RLHQ-luc
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
In this deck from the Swiss HPC Conference, Mark Wilkinson presents: 40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility.
"DiRAC is the integrated supercomputing facility for theoretical modeling and HPC-based research in particle physics, and astrophysics, cosmology, and nuclear physics, all areas in which the UK is world-leading. DiRAC provides a variety of compute resources, matching machine architecture to the algorithm design and requirements of the research problems to be solved. As a single federated Facility, DiRAC allows more effective and efficient use of computing resources, supporting the delivery of the science programs across the STFC research communities. It provides a common training and consultation framework and, crucially, provides critical mass and a coordinating structure for both small- and large-scale cross-discipline science projects, the technical support needed to run and develop a distributed HPC service, and a pool of expertise to support knowledge transfer and industrial partnership projects. The on-going development and sharing of best-practice for the delivery of productive, national HPC services with DiRAC enables STFC researchers to produce world-leading science across the entire STFC science theory program."
Watch the video: https://wp.me/p3RLHQ-k94
Learn more: https://dirac.ac.uk/
and
http://hpcadvisorycouncil.com/events/2019/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
Giuseppe will present the differences between high-performance and high-throughput applications. High-throughput computing (HTC) refers to computations where individual tasks do not need to interact while running. It differs from High-performance (HPC) where frequent and rapid exchanges of intermediate results is required to perform the computations. HPC codes are based on tightly coupled MPI, OpenMP, GPGPU, and hybrid programs and require low latency interconnected nodes. HTC makes use of unreliable components distributing the work out to every node and collecting results at the end of all parallel tasks.
Visit: https://www.eudat.eu/eudat-summer-school
Ariel Waizel discusses the Data Plane Development Kit (DPDK), an API for developing fast packet processing code in user space.
* Who needs this library? Why bypass the kernel?
* How does it work?
* How good is it? What are the benchmarks?
* Pros and cons
Ariel worked on kernel development at the IDF, Ben Gurion University, and several companies. He is interested in networking, security, machine learning, and basically everything except UI development. Currently a Solution Architect at ConteXtream (an HPE company), which specializes in SDN solutions for the telecom industry.
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)Alex Rasmussen
We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100 TB of input data spread across 832 disks in 52 nodes at a rate of 0.916 TB/min. When evaluated against the annual Indy GraySort sorting benchmark, TritonSort is 60% better in absolute performance and has over six times the per-node efficiency of the previous record holder. In this paper, we describe the hardware and software architecture necessary to operate TritonSort at this level of efficiency. Through careful management of system resources to ensure cross-resource balance, we are able to sort data at approximately 80% of the disks' aggregate sequential write speed. We believe the work holds a number of lessons for balanced system design and for scale-out architectures in general. While many interesting systems are able to scale linearly with additional servers, per-server performance can lag behind per-server capacity by more than an order of magnitude. Bridging the gap between high scalability and high performance would enable either significantly cheaper systems that are able to do the same work or provide the ability to address significantly larger problem sets with the same infrastructure.
Analytical, prototyping, model-based systems engineering and custom discrete-event model development of automotive networks are inaccurate, expensive, and takes too long to do detailed routing analysis, Quality-of-Service (QoS) trade-off, and bandwidth exploration. To capture the nuances of QoS, scheduling, buffer management, and network topologies, these solutions require a considerable amount of time, costs, and customization. To achieve the reliability of wiring harness, the latency and bandwidth measurements of automotive networks must be accurate, tested for failure conditions, and simulated for security breaches, traffic spikes, and translations.
I presented a whirlwind tour of the most common benchmark tools used to measure parallel file system performance and reviewed case studies of how these have been used in the procurement of NERSC's large file systems at the 2022 Lustre User Group.
In-memory processing has started to become the norm in large scale data handling. This is aclose to the metal analysis of highly important but often neglected aspects of memory accesstimes and how it impacts big data and NoSQL technologies.We cover aspects such as the TLB, the Transparent Huge Pages, the QPI Link, Hyperthreading and the impact of virtualization on high-memory footprint applications. We present benchmarks of various technologies ranging from Cloudera’s Impala to Couchbase and how they are impacted by the underlying hardware.The key takeaway is a better understanding of how to size a cluster, how to choose a cloud provider and an instance type for big data and NoSQL workloads and why not every core or GB of RAM is created equal.
High-performance 32G Fibre Channel Module on MDS 9700 Directors:Tony Antony
To better serve the new application requirements, Cisco is introducing a New high-performance Analytics ready 32G Fibre Channel Module on MDS 9700 Directors and a new 32G Host Bus Adapter for UCS C-series. The end to end 32G FC support across Cisco DC platforms set new standards for Storage Networking providing customers with choice. Along with this announcement, Cisco is also announcing NVMe over Fabric support on MDS 9000 Series enabling customers to take advantage of the performance and low latency benefits offered by the new technology to scale efficiently in the post-flash environments.
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS.
First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark.
In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
“Show Me the Garbage!”, Garbage Collection a Friend or a FoeHaim Yadid
“Just leave the garbage outside and we will take care of it for you”. This is the panacea promised by garbage collection mechanisms built into most software stacks available today. So, we don’t need to think about it anymore, right? Wrong! When misused, garbage collectors can fail miserably. When this happens they slow down your application and lead to unacceptable pauses. In this talk we will go over different garbage collectors approaches and understand under which conditions they function well.
In this presentation from the DDN User Meeting at SC13, Tommy Minyard from the Texas Advanced Computing Center describes TACC's new Corral data storage system.
Watch the video presentation: http://insidehpc.com/2013/11/13/ddn-user-meeting-coming-sc13-nov-18/
In this deck from ATPESC 2019, Ken Raffenetti from Argonne presents an overview of HPC interconnects.
"The Argonne Training Program on Extreme-Scale Computing (ATPESC) provides intensive, two-week training on the key skills, approaches, and tools to design, implement, and execute computational science and engineering applications on current high-end computing systems and the leadership-class computing systems of the future."
Watch the video: https://wp.me/p3RLHQ-luc
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
In this deck from the Swiss HPC Conference, Mark Wilkinson presents: 40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility.
"DiRAC is the integrated supercomputing facility for theoretical modeling and HPC-based research in particle physics, and astrophysics, cosmology, and nuclear physics, all areas in which the UK is world-leading. DiRAC provides a variety of compute resources, matching machine architecture to the algorithm design and requirements of the research problems to be solved. As a single federated Facility, DiRAC allows more effective and efficient use of computing resources, supporting the delivery of the science programs across the STFC research communities. It provides a common training and consultation framework and, crucially, provides critical mass and a coordinating structure for both small- and large-scale cross-discipline science projects, the technical support needed to run and develop a distributed HPC service, and a pool of expertise to support knowledge transfer and industrial partnership projects. The on-going development and sharing of best-practice for the delivery of productive, national HPC services with DiRAC enables STFC researchers to produce world-leading science across the entire STFC science theory program."
Watch the video: https://wp.me/p3RLHQ-k94
Learn more: https://dirac.ac.uk/
and
http://hpcadvisorycouncil.com/events/2019/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
Giuseppe will present the differences between high-performance and high-throughput applications. High-throughput computing (HTC) refers to computations where individual tasks do not need to interact while running. It differs from High-performance (HPC) where frequent and rapid exchanges of intermediate results is required to perform the computations. HPC codes are based on tightly coupled MPI, OpenMP, GPGPU, and hybrid programs and require low latency interconnected nodes. HTC makes use of unreliable components distributing the work out to every node and collecting results at the end of all parallel tasks.
Visit: https://www.eudat.eu/eudat-summer-school
Ariel Waizel discusses the Data Plane Development Kit (DPDK), an API for developing fast packet processing code in user space.
* Who needs this library? Why bypass the kernel?
* How does it work?
* How good is it? What are the benchmarks?
* Pros and cons
Ariel worked on kernel development at the IDF, Ben Gurion University, and several companies. He is interested in networking, security, machine learning, and basically everything except UI development. Currently a Solution Architect at ConteXtream (an HPE company), which specializes in SDN solutions for the telecom industry.
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)Alex Rasmussen
We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100 TB of input data spread across 832 disks in 52 nodes at a rate of 0.916 TB/min. When evaluated against the annual Indy GraySort sorting benchmark, TritonSort is 60% better in absolute performance and has over six times the per-node efficiency of the previous record holder. In this paper, we describe the hardware and software architecture necessary to operate TritonSort at this level of efficiency. Through careful management of system resources to ensure cross-resource balance, we are able to sort data at approximately 80% of the disks' aggregate sequential write speed. We believe the work holds a number of lessons for balanced system design and for scale-out architectures in general. While many interesting systems are able to scale linearly with additional servers, per-server performance can lag behind per-server capacity by more than an order of magnitude. Bridging the gap between high scalability and high performance would enable either significantly cheaper systems that are able to do the same work or provide the ability to address significantly larger problem sets with the same infrastructure.
Analytical, prototyping, model-based systems engineering and custom discrete-event model development of automotive networks are inaccurate, expensive, and takes too long to do detailed routing analysis, Quality-of-Service (QoS) trade-off, and bandwidth exploration. To capture the nuances of QoS, scheduling, buffer management, and network topologies, these solutions require a considerable amount of time, costs, and customization. To achieve the reliability of wiring harness, the latency and bandwidth measurements of automotive networks must be accurate, tested for failure conditions, and simulated for security breaches, traffic spikes, and translations.
I presented a whirlwind tour of the most common benchmark tools used to measure parallel file system performance and reviewed case studies of how these have been used in the procurement of NERSC's large file systems at the 2022 Lustre User Group.
In-memory processing has started to become the norm in large scale data handling. This is aclose to the metal analysis of highly important but often neglected aspects of memory accesstimes and how it impacts big data and NoSQL technologies.We cover aspects such as the TLB, the Transparent Huge Pages, the QPI Link, Hyperthreading and the impact of virtualization on high-memory footprint applications. We present benchmarks of various technologies ranging from Cloudera’s Impala to Couchbase and how they are impacted by the underlying hardware.The key takeaway is a better understanding of how to size a cluster, how to choose a cloud provider and an instance type for big data and NoSQL workloads and why not every core or GB of RAM is created equal.
High-performance 32G Fibre Channel Module on MDS 9700 Directors:Tony Antony
To better serve the new application requirements, Cisco is introducing a New high-performance Analytics ready 32G Fibre Channel Module on MDS 9700 Directors and a new 32G Host Bus Adapter for UCS C-series. The end to end 32G FC support across Cisco DC platforms set new standards for Storage Networking providing customers with choice. Along with this announcement, Cisco is also announcing NVMe over Fabric support on MDS 9000 Series enabling customers to take advantage of the performance and low latency benefits offered by the new technology to scale efficiently in the post-flash environments.
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS.
First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark.
In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
“Show Me the Garbage!”, Garbage Collection a Friend or a FoeHaim Yadid
“Just leave the garbage outside and we will take care of it for you”. This is the panacea promised by garbage collection mechanisms built into most software stacks available today. So, we don’t need to think about it anymore, right? Wrong! When misused, garbage collectors can fail miserably. When this happens they slow down your application and lead to unacceptable pauses. In this talk we will go over different garbage collectors approaches and understand under which conditions they function well.
In this presentation from the DDN User Meeting at SC13, Tommy Minyard from the Texas Advanced Computing Center describes TACC's new Corral data storage system.
Watch the video presentation: http://insidehpc.com/2013/11/13/ddn-user-meeting-coming-sc13-nov-18/
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR - Slides Onl...Peter Gallagher
In this session delivered at Leeds IoT, I talk about how you can control a 3D printed Robot Arm with a Raspberry Pi, .NET 8, Blazor and SignalR.
I also show how you can use a Unity app on an Meta Quest 3 to control the arm VR too.
You can find the GitHub repo and workshop instructions here;
https://bit.ly/dotnetrobotgithub
Google Calendar is a versatile tool that allows users to manage their schedules and events effectively. With Google Calendar, you can create and organize calendars, set reminders for important events, and share your calendars with others. It also provides features like creating events, inviting attendees, and accessing your calendar from mobile devices. Additionally, Google Calendar allows you to embed calendars in websites or platforms like SlideShare, making it easier for others to view and interact with your schedules.
1. CS162
Operating Systems and
Systems Programming
Lecture 19
Filesystems 1: Performance (Con’t),
Queueing Theory, Filesystem Design
April 5th, 2022
Prof. Anthony Joseph and John Kubiatowicz
http://cs162.eecs.Berkeley.edu