SlideShare une entreprise Scribd logo
1  sur  47
Télécharger pour lire hors ligne
WIR SCHAFFEN WISSEN – HEUTE FÜR MORGEN
OpenCAPI-based image analysis
pipeline for 18 GB/s kHz-framerate X-
ray camera at the SLS synchrotron
Filip Leonarski :: Beamline Data Scientist :: Macromolecular Crystallography
Page 1
• Introduction: Macromolecular crystallography at synchrotrons and X-ray
detectors
• Technology: POWER + OpenCAPI
• Solution: Jungfraujoch
Plan
Page 2
X-ray
1901 Nobel Prize
W. Röentgen
Discovery of X-rays
X-ray macromolecular crystallography (MX)
Page 4
1901 Nobel Prize
W. Röentgen
Discovery of X-rays
(Photo 51 by R.
Gosling and R.
Franklin)
1962 Nobel Prize
F. Crick, J. Watson and
M. Wilkins
Structure of DNA
double helix solved
with X-rays
X-ray macromolecular crystallography (MX)
Page 5
1901 Nobel Prize
W. Röentgen
Discovery of X-rays
(Photo 51 by R.
Gosling and R.
Franklin)
1962 Nobel Prize
F. Crick, J. Watson and
M. Wilkins
Structure of DNA
double helix solved
with X-rays
2009 Nobel Prize
V. Ramakrishnan*, T.
Steiz, A. Yonath*
Structure of ribosome
(*) some of their structures
were solved at PSI
Wikipedia:
X-ray crystallography is the experimental science determining the atomic and
molecular structure of a crystal, in which the crystalline structure causes a beam of
incident X-rays to diffract into many specific directions. By measuring the angles
and intensities of these diffracted beams, a crystallographer can produce a three-
dimensional picture of the density of electrons within the crystal.
X-ray macromolecular crystallography (MX)
Page 6
• Particle accelerators are source of the
brightest X-ray beam (multiple orders of
magnitudes as compared to conventional X-
ray tubes), when charged particles travel
through magnetic field
- Effect is nuisance for high energy physics
(undesirable energy loss),
- but it is a blessing for structural science =>
modern storage rings are build exclusively
as light sources.
• Synchrotrons provide continuous X-ray
beam, while X-ray free electron lasers
produce femtosecond long bright pulses
MX at synchrotron
Page 7
Paul Scherrer Institute
Page 8
SwissFEL
Swiss Light
Source
Swiss Alps
• 3 experimental stations at the synchrotron
• 1 experimental station at the SwissFEL
• Beamtime is shared between academic and
industrial users
- Industrial customers are mostly pharmaceutical
companies looking for drug binding to potential
drug targets
- Academic users are universities and scientific
institutes worldwide doing basic research in
structural biology
MX at Swiss Light Source and SwissFEL
Page 9
• New storage ring to be installed in 2024-2025
• Flux (photons/second) will increase by order of magnitude
• Measurements can be done 10x faster
• Enabling fragment screening method – i.e. single protein target is
crystallized with hundredths or thousands of molecular fragments to
find best drug
- This is like molecular docking, but fully experimentally
Major upgrade in 2024/2025 for SLS 2.0
Page 10
• PSI is major detector developer
- Hybrid pixel detectors developed for
CERN high energy physics
experiments
- Design could be used for X-ray
cameras – first PILATUS in 2000s
- PSI start-up Dectris, commercialized
PILATUS and EIGER detectors, most
synchrotrons are equipped with
their detectors
• Currently PSI is rolling out new
generation: JUNGFRAU
Page 11
New detector for SwissFEL and SLS 2.0
• Silicon sensor converts X-ray to
electric charge
• Bump bonded to sensor is ASIC, with
dedicated electronics for each pixel
• Pixel has three capacitors allowing
different amplification
• They are dynamically switched during
exposure to adjust for incoming
charge
Page 12
Adaptive gain detector to increase dynamic
range
Aim: measure reliably from 1 to 20,000,000 photons per second
Page 13
Adaptive gain detector to increase dynamic
range
0001010111110011
Pixel output in JF:
0001010111110011
Gain: 00:G0 01:G1 11:G2
ADC value: 0001010111110011
Photon number: =
!"# $ %&'&()*+
,*-.∗%01)1. &.&2,3 Gain and pedestal factors are
specific for pixel and gain setting
Prior calibration
Dedicated dark run
• Detector is modular
• 524,288 pixels per module
• 2.2 kHz * 524,288 pixels * 16 bit = 2.3 GB/s
- 2 x 10 Gbit/s links
• 4 Mpixel detector (2020)
- 16 x 10 Gbit/s
• 10 Mpixel (2022)
- 40 x 10 Gbit/s
Page 14
Modular detector
4 Mpixel (2020)
10 Mpixel (2022)
Page 15
MX detector data rates double every 2 years
0.1
1
10
100
2006 2008 2010 2012 2014 2016 2018 2020 2022 2024
Frame
rate
[GB/s]
Year
2007 PSI PILATUS 6 Mpixel 12.5 Hz 0.2 GB/s
2014 Dectris EIGER 16 Mpixel 133 Hz 3.4 GB/s
2019 Dectris EIGER 2 XE 16 Mpixel 400 Hz 13.5 GB/s
2020 PSI JUNGFRAU 4 Mpixel 2200 Hz 18.4 GB/s
2022 PSI JUNGFRAU 10 Mpixel 2200 Hz 46.1 GB/s
• Detector is streaming frames over UDP
- Receiver using Linux Datagram Socket
• Conversion of pixel read-out
- CPU SIMD code
• Compression
- CPU compression
First approach: scale conventional architecture
Page 16
• Detector is streaming frames over UDP
- Receiver using Linux Datagram Socket
• Conversion of pixel read-out
- CPU SIMD code
• Compression
- CPU compression
First approach: scale conventional architecture
Page 17
Aim
20 GB/s
Reached
5 GB/s
WIR SCHAFFEN WISSEN – HEUTE FÜR MORGEN
POWER / OpenCAPI / FPGA architecture
Page 18
• Real-time performance
- FPGA design is cycle-accurate, with fixed latency and throughput
• Large memory throughput
- FPGAs with HBM2 have 460 GB/s bandwidth to 8 GB large memory
• Ethernet on-board
- FPGA are made to work with network, often having dedicated “hard” cores for
ethernet
• Development of FPGAs is difficult and time consuming
- Hardware description languages
- PCI Express
• Virtex Ultrascale+ HBM (XCVU33P and XCVU35P)
- Availble as low-profile half-length 75W cards
FPGA are perfect devices for data acquisition
Page 19
• C/C++ compiler to produce
hardware design language (Verilog
or VHDL)
• All code is valid C++ code, it can be
executed on CPU and functionally is
generally equivalent
• Dedicated pragma to guide FPGA
synthesis
• It is generally understandable for
software developers, but may
contain strange/inoptimal
constructs from software point of
view
High-level synthesis
Page 20
Bitshuffle for 16-bit numbers
• For VU33/35P:
- Size: 8 GB
- Bandwidth: up to 460 GB/s
- Latency: up to 120 cycles @ 200 MHz
• Complex architecture
- 32 x 256-bit AXI3 interfaces
- Either operating as 32 separate memories
- Or as single memory with crossbar (at the cost of up to 50% throughput)
• 256-bit is a problem, as data are 512-bit (PCIe Gen3 x16) or 1024-bit (OpenCAPI,
PCIe Gen4 x16)
• Simulation only with special tools (Cadence Xcelium), impossible with Xilinx tools
High-bandwidth memory
Page 21
• PCI Express is CPU-centric bus, as it is design to
support peripherals
• This is good model, when FPGA is a coprocessor
to CPU – which sends data, and waits for reply
=> but for data acquisition, it is FPGA that is
producing the data, CPU has no prior knowledge
which packet will be processed at the time
• DMA is operating on physical addresses: virtual
addresses need to be pinned by kernel (so are
not swapped and moved)
Þ need to maintain own driver
Þ address translation cache possible on FPGA,
but requires memory
PCI Express DMA
Page 22
Xilinx QDMA is a robust
but highly complex
solution for PCI Express –
used to interface FPGAs
with x86 AMD and Intel
CPUs
• IBM POWER9 showed great numbers for
I/O and memory throughput in Summit
and Sierra supercomputers
• IBM designed own memory coherent
interface for accelerators
(CAPI/OpenCAPI), which has advantages
over PCIe
POWER architecture
Page 23
Source: Wikipedia
OpenCAPI
Page 24
FPGA
board
POWER9
CPU
OpenCAPI
cable
OpenCAPI
Page 25
FPGA
board
POWER9
CPU
OpenCAPI
cable
• Predecessor CAPI => proprietary IBM
• Communication over PCIe physical lines
(but different protocol)
• OpenCAPI => consortium model
• Dedicated cabling (8 x 25 Gbit/s lines)
• For POWER10 – this will be default memory interface,
(allowing to have any type of memory attached to CPU + to
“share” memory over network)
• Similar difference what 80286/80386 virtual
mode brought to software development
• In OpenCAPI one needs single kernel operation
=> Attach accelerator to running process
• Then, accelerator has access to virtual address
space of running process – it is FPGA that is
initiating the communication
=> Address translation is handled by TLB and OS
=> FPGA sees memory in a fully cache-coherent
way
• All security/reliability/efficiency mechanisms in
CPU and kernel are also present in OpenCAPI
Page 26
What difference brings OpenCAPI?
Source: Wikipedia
• Main function for the action contains a pointer to virutal address space
- On device the pointer will be synthesized as 1024-bit master memory-mapped
AXI interface
- On CPU this pointer has to be just set to zero (which is first address of virtual
address space)
• Any cell in virtual memory is just accessed as offset from this pointer
• Only requirement is that memory is aligned to 128-bytes
- No special memory allocator, malloc or mmap is fine
- No pinning/registering
• The same memory buffer class for both simulation and working with device
• For configuration, there is also 4 MiB memory-maped I/O space (like BAR in PCIe)
- On device implemented as slave AXI-lite (32-bit)
How to develop with OpenCAPI?
Page 27
• Open source “shell” mantained by IBM
• http://github.com/OpenCAPI/oc-accel
• Provides ready made tool to work with OpenCAPI (from transceiver setup to
AXImm bridge)
• Provides preconfigured interfaces for I/O peripherals (HBM, 100G, NVMe)
• Provides simulation environment
- One can simulate both SW and HW in a single simulation (both user FPGA
design and software are not modified from their “real” implementation)
OC-Accel
Page 28
WIR SCHAFFEN WISSEN – HEUTE FÜR MORGEN
Jungfraujoch – FPGA implementation
Page 29
Page 30
Jungfraujoch server
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
FPGA board with OpenCAPI interface
- Data acquisition
- Initial data analysis
- Pre-compression
(2.5 Mpixel/board for JF)
Up to 50 GB/s acquisition and
data analysis in a single 2U
IBM POWER9 server with 1-4 FPGA
boards
Frame
summation
Page 31
Jungfraujoch FGPA streaming design
Modular design
• Stream of data handled by successive cores doing work in parallel
à throughput and latency of each core is determined by the hardware design
• Extra stages can be relatively simply added, option to bypass cores
• All cores are C++ functions, connected with AXI-Stream FIFOs
• As buffering is expensive on FPGA, it is best suited for algorithm that have limited
dependencies between frames
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 32
Jungfraujoch
Ethernet UDP/IP core
Processes ethernet packets from network, ignores unnecessary packets, reads
frame header to get frame number, module number, etc.
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 33
Jungfraujoch
Dark current core
This cores is responsible for calculating moving average of detector frames.
Calculated value is used as dark current (pedestal) for subsequent frames.
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 34
Jungfraujoch
Conversion core
This cores translates JUNGFRAU read-out into units of energy or photon counts.
It benefits from very fast HBM2 memory within the FPGA (460 GB/s). Data
leaving this core can be used for processing by data analysis software.
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 35
Jungfraujoch
Frame summation core (work in progress)
As data that left gain correction core are on linear scale, they can be summed to
reduce downstream data rate, if lower frame rate is needed, as compared to
detector.
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 36
Jungfraujoch
Strong pixel finder core
This is first step of spot finding algorithm (for example COLSPOT). It identifies
pixels that are stronger than given number of standard deviations of their
neighborhood.
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 37
Jungfraujoch
Bitshuffle
FPGAs are bit order agnostic. Therefore exchanging bit order in popular
compression prefilter is pretty much for free on FPGA.
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 38
Jungfraujoch
Host memory write
Address in host memory buffer is calculated and data forwarded to host memory
via OpenCAPI. Additional image statistics are saved as well.
Ethernet
UDP/IP
Dark
current
Conversion
Strong
pixel finder
Bitshuffle
Memory
writer
Frame
summation
Page 39
Jungfraujoch implementation on VU33P FPGA
Spot finding
HBM
Gain
Pedestal
Write data
OpenCAPI
100G
UDP
Jungfraujoch FPGA power usage is 18 W/board
for the whole streaming functionality
Page 40
Xilinx Vivado Power Report
2 boards for 4 Mpixel JUNGFRAU and 4 boards for 10 Mpixel JUNGFRAU
• VU33P or VU35P with 8 GB of HBM2
• OpenCAPI link and PCIe Gen3 x16 (or two
PCIe Gen4 x8)
• Small flash (2 kb) to store MAC address,
board IR
• QSFP-DD optical socket (same as QSFP28,
but with 8 lanes for 2x100G) =>
compatible with QSFP28 transceivers
• Up to 75W
Alpha Data 9H3 board
Page 41
• Software tests – Catch2
- 8 min
- Among other software tests includes 13
FPGA action tests (whole SLS code)
- Automated tests cover 95% lines of high-
level synthesis code
- Covers most of the functionality
correctness – including address calculation
- Main limitation is debugging of FIFOs
parallel behavior (deadlocks, etc.)
• Hardware simulation – Cadence Xcelium
- 4 hours
- Collection of 8 frames from single module
- Checks if hardware description is correct,
can find problems with synchronization,
and other, very rare, issues
- Too slow to verify functionality
OpenCAPI programming - testing
Page 42
• Detector and data acquisition system was sent in
November for an experiment in Photon Factory, KEK
• More than 2,000 datasets collected for protein
targets, few real-life native-SAD structures solved
• Due to pandemic, detector support and
development (including deployment of new FPGA
design) was done fully remotely from Switzerland
Commissioning in KEK (Jan – May 2021)
Page 43
BL-1A Photon Factory
JUNGFRAU detector (up)
tested in helium chamber
for native-SAD
measurements with 3.75
keV X-rays
Page 44
Structure of Nucleocapsid Phosphoprotein from
SARS-CoV-2 solved in 1 second
• Crystal was previously measured with
conventional setup at our beamline –
with measurement taking longer than
one minute
• With JUNGFRAU detector and
OpenCAPI readout, 2000 images
collected in one second allowed to
solve structure of this protein
• Experimental team: Filip Leonarski, Sylvain
Engilberge, Vincent Olieric, Meitian Wang (MX
Group), Aldo Mozzanica (PSI Detector Group)
• SARS-CoV-2 protein was produced by Zinzula, L.,
Basquin, J., Bracher, A., Baumeister, W. (MPI,
Martinsried)
Possible gain from using FPGA based system
Page 45
Courtesy: B. Mesnet (IBM)
Possible gain from using FPGA based system
Page 46
Courtesy: B. Mesnet (IBM)
MX Group (PSI)
• Vincent Olieric
• Takashi Tomizaki
• Chia-Ying Huang
• Sylvain Engilberg
• Justyna Wojdyła
• Meitian Wang
Detector Group (PSI)
• Aldo Mozzanica
• Martin Brückner
• Carlos Lopez-Cuenca
• Bernd Schmitt
Science IT (PSI)
• Leonardo Sala
Controls (PSI)
• Andrej Babic
• Leonardo Hax-Damiani
SLS management (PSI)
• Oliver Bunk
Photon Factory, KEK
• Naohiro Matsugaki
• Yusuke Yamada
• Masahide Hikita
MAX IV
• Jie Nan
• Zdenek Matej
Uni Konstanz
• Kay Diederichs
LBL
• Aaron Brewster
DLS
• Graeme Winter
• DIALS Team
ESRF
• Jerome Kieffer
IBM Systems (France)
• Alexandre Castellane
• Bruno Mesnet
InnoBoost SA
• Lionel Clavien
Acknowledgements
Page 47

Contenu connexe

Tendances

TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformGanesan Narayanasamy
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformGanesan Narayanasamy
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsGanesan Narayanasamy
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research Ganesan Narayanasamy
 
High Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & RankingsHigh Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & Rankingsinside-BigData.com
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputinginside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
SGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production SupercomputingSGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production Supercomputinginside-BigData.com
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1IBM Sverige
 

Tendances (20)

POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research
 
High Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & RankingsHigh Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & Rankings
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputing
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
DOME 64-bit μDataCenter
DOME 64-bit μDataCenterDOME 64-bit μDataCenter
DOME 64-bit μDataCenter
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
SGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production SupercomputingSGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production Supercomputing
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1
 

Similaire à OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray Camera at the Swiss Light Source synchrotron

CEPH DAY BERLIN - CEPH ON THE BRAIN!
CEPH DAY BERLIN - CEPH ON THE BRAIN!CEPH DAY BERLIN - CEPH ON THE BRAIN!
CEPH DAY BERLIN - CEPH ON THE BRAIN!Ceph Community
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureCeph Community
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM Research
 
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)byteLAKE
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph Community
 
IBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM Research
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)Julien SIMON
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Heiko Joerg Schick
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Semiconductor overview
Semiconductor overviewSemiconductor overview
Semiconductor overviewNabil Chouba
 
OpenPOWER Summit 2020 - OpenCAPI Keynote
OpenPOWER Summit 2020 -  OpenCAPI KeynoteOpenPOWER Summit 2020 -  OpenCAPI Keynote
OpenPOWER Summit 2020 - OpenCAPI KeynoteAllan Cantle
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFZoltan Arnold Nagy
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor DesignSri Prasanna
 
Oow 2008 yahoo_pie-db
Oow 2008 yahoo_pie-dbOow 2008 yahoo_pie-db
Oow 2008 yahoo_pie-dbbohanchen
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 

Similaire à OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray Camera at the Swiss Light Source synchrotron (20)

CEPH DAY BERLIN - CEPH ON THE BRAIN!
CEPH DAY BERLIN - CEPH ON THE BRAIN!CEPH DAY BERLIN - CEPH ON THE BRAIN!
CEPH DAY BERLIN - CEPH ON THE BRAIN!
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
 
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-Gene
 
IBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOME
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Chips&toys
Chips&toysChips&toys
Chips&toys
 
Semiconductor overview
Semiconductor overviewSemiconductor overview
Semiconductor overview
 
OpenPOWER Summit 2020 - OpenCAPI Keynote
OpenPOWER Summit 2020 -  OpenCAPI KeynoteOpenPOWER Summit 2020 -  OpenCAPI Keynote
OpenPOWER Summit 2020 - OpenCAPI Keynote
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
supercomputer
supercomputersupercomputer
supercomputer
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor Design
 
Oow 2008 yahoo_pie-db
Oow 2008 yahoo_pie-dbOow 2008 yahoo_pie-db
Oow 2008 yahoo_pie-db
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 

Plus de Ganesan Narayanasamy

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency programGanesan Narayanasamy
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and VerilogGanesan Narayanasamy
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISAGanesan Narayanasamy
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Ganesan Narayanasamy
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsGanesan Narayanasamy
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsGanesan Narayanasamy
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsGanesan Narayanasamy
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems Ganesan Narayanasamy
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Ganesan Narayanasamy
 
OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction Ganesan Narayanasamy
 
Open Hardware and Future Computing
Open Hardware and Future ComputingOpen Hardware and Future Computing
Open Hardware and Future ComputingGanesan Narayanasamy
 

Plus de Ganesan Narayanasamy (20)

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
 
Robustness in deep learning
Robustness in deep learningRobustness in deep learning
Robustness in deep learning
 
Perspectives of Frond end Design
Perspectives of Frond end DesignPerspectives of Frond end Design
Perspectives of Frond end Design
 
A2O Core implementation on FPGA
A2O Core implementation on FPGAA2O Core implementation on FPGA
A2O Core implementation on FPGA
 
OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction
 
Open Hardware and Future Computing
Open Hardware and Future ComputingOpen Hardware and Future Computing
Open Hardware and Future Computing
 
AI/Cloud Technology access
AI/Cloud Technology access AI/Cloud Technology access
AI/Cloud Technology access
 

Dernier

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Dernier (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray Camera at the Swiss Light Source synchrotron

  • 1. WIR SCHAFFEN WISSEN – HEUTE FÜR MORGEN OpenCAPI-based image analysis pipeline for 18 GB/s kHz-framerate X- ray camera at the SLS synchrotron Filip Leonarski :: Beamline Data Scientist :: Macromolecular Crystallography Page 1
  • 2. • Introduction: Macromolecular crystallography at synchrotrons and X-ray detectors • Technology: POWER + OpenCAPI • Solution: Jungfraujoch Plan Page 2
  • 3. X-ray 1901 Nobel Prize W. Röentgen Discovery of X-rays
  • 4. X-ray macromolecular crystallography (MX) Page 4 1901 Nobel Prize W. Röentgen Discovery of X-rays (Photo 51 by R. Gosling and R. Franklin) 1962 Nobel Prize F. Crick, J. Watson and M. Wilkins Structure of DNA double helix solved with X-rays
  • 5. X-ray macromolecular crystallography (MX) Page 5 1901 Nobel Prize W. Röentgen Discovery of X-rays (Photo 51 by R. Gosling and R. Franklin) 1962 Nobel Prize F. Crick, J. Watson and M. Wilkins Structure of DNA double helix solved with X-rays 2009 Nobel Prize V. Ramakrishnan*, T. Steiz, A. Yonath* Structure of ribosome (*) some of their structures were solved at PSI
  • 6. Wikipedia: X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles and intensities of these diffracted beams, a crystallographer can produce a three- dimensional picture of the density of electrons within the crystal. X-ray macromolecular crystallography (MX) Page 6
  • 7. • Particle accelerators are source of the brightest X-ray beam (multiple orders of magnitudes as compared to conventional X- ray tubes), when charged particles travel through magnetic field - Effect is nuisance for high energy physics (undesirable energy loss), - but it is a blessing for structural science => modern storage rings are build exclusively as light sources. • Synchrotrons provide continuous X-ray beam, while X-ray free electron lasers produce femtosecond long bright pulses MX at synchrotron Page 7
  • 8. Paul Scherrer Institute Page 8 SwissFEL Swiss Light Source Swiss Alps
  • 9. • 3 experimental stations at the synchrotron • 1 experimental station at the SwissFEL • Beamtime is shared between academic and industrial users - Industrial customers are mostly pharmaceutical companies looking for drug binding to potential drug targets - Academic users are universities and scientific institutes worldwide doing basic research in structural biology MX at Swiss Light Source and SwissFEL Page 9
  • 10. • New storage ring to be installed in 2024-2025 • Flux (photons/second) will increase by order of magnitude • Measurements can be done 10x faster • Enabling fragment screening method – i.e. single protein target is crystallized with hundredths or thousands of molecular fragments to find best drug - This is like molecular docking, but fully experimentally Major upgrade in 2024/2025 for SLS 2.0 Page 10
  • 11. • PSI is major detector developer - Hybrid pixel detectors developed for CERN high energy physics experiments - Design could be used for X-ray cameras – first PILATUS in 2000s - PSI start-up Dectris, commercialized PILATUS and EIGER detectors, most synchrotrons are equipped with their detectors • Currently PSI is rolling out new generation: JUNGFRAU Page 11 New detector for SwissFEL and SLS 2.0
  • 12. • Silicon sensor converts X-ray to electric charge • Bump bonded to sensor is ASIC, with dedicated electronics for each pixel • Pixel has three capacitors allowing different amplification • They are dynamically switched during exposure to adjust for incoming charge Page 12 Adaptive gain detector to increase dynamic range Aim: measure reliably from 1 to 20,000,000 photons per second
  • 13. Page 13 Adaptive gain detector to increase dynamic range 0001010111110011 Pixel output in JF: 0001010111110011 Gain: 00:G0 01:G1 11:G2 ADC value: 0001010111110011 Photon number: = !"# $ %&'&()*+ ,*-.∗%01)1. &.&2,3 Gain and pedestal factors are specific for pixel and gain setting Prior calibration Dedicated dark run
  • 14. • Detector is modular • 524,288 pixels per module • 2.2 kHz * 524,288 pixels * 16 bit = 2.3 GB/s - 2 x 10 Gbit/s links • 4 Mpixel detector (2020) - 16 x 10 Gbit/s • 10 Mpixel (2022) - 40 x 10 Gbit/s Page 14 Modular detector 4 Mpixel (2020) 10 Mpixel (2022)
  • 15. Page 15 MX detector data rates double every 2 years 0.1 1 10 100 2006 2008 2010 2012 2014 2016 2018 2020 2022 2024 Frame rate [GB/s] Year 2007 PSI PILATUS 6 Mpixel 12.5 Hz 0.2 GB/s 2014 Dectris EIGER 16 Mpixel 133 Hz 3.4 GB/s 2019 Dectris EIGER 2 XE 16 Mpixel 400 Hz 13.5 GB/s 2020 PSI JUNGFRAU 4 Mpixel 2200 Hz 18.4 GB/s 2022 PSI JUNGFRAU 10 Mpixel 2200 Hz 46.1 GB/s
  • 16. • Detector is streaming frames over UDP - Receiver using Linux Datagram Socket • Conversion of pixel read-out - CPU SIMD code • Compression - CPU compression First approach: scale conventional architecture Page 16
  • 17. • Detector is streaming frames over UDP - Receiver using Linux Datagram Socket • Conversion of pixel read-out - CPU SIMD code • Compression - CPU compression First approach: scale conventional architecture Page 17 Aim 20 GB/s Reached 5 GB/s
  • 18. WIR SCHAFFEN WISSEN – HEUTE FÜR MORGEN POWER / OpenCAPI / FPGA architecture Page 18
  • 19. • Real-time performance - FPGA design is cycle-accurate, with fixed latency and throughput • Large memory throughput - FPGAs with HBM2 have 460 GB/s bandwidth to 8 GB large memory • Ethernet on-board - FPGA are made to work with network, often having dedicated “hard” cores for ethernet • Development of FPGAs is difficult and time consuming - Hardware description languages - PCI Express • Virtex Ultrascale+ HBM (XCVU33P and XCVU35P) - Availble as low-profile half-length 75W cards FPGA are perfect devices for data acquisition Page 19
  • 20. • C/C++ compiler to produce hardware design language (Verilog or VHDL) • All code is valid C++ code, it can be executed on CPU and functionally is generally equivalent • Dedicated pragma to guide FPGA synthesis • It is generally understandable for software developers, but may contain strange/inoptimal constructs from software point of view High-level synthesis Page 20 Bitshuffle for 16-bit numbers
  • 21. • For VU33/35P: - Size: 8 GB - Bandwidth: up to 460 GB/s - Latency: up to 120 cycles @ 200 MHz • Complex architecture - 32 x 256-bit AXI3 interfaces - Either operating as 32 separate memories - Or as single memory with crossbar (at the cost of up to 50% throughput) • 256-bit is a problem, as data are 512-bit (PCIe Gen3 x16) or 1024-bit (OpenCAPI, PCIe Gen4 x16) • Simulation only with special tools (Cadence Xcelium), impossible with Xilinx tools High-bandwidth memory Page 21
  • 22. • PCI Express is CPU-centric bus, as it is design to support peripherals • This is good model, when FPGA is a coprocessor to CPU – which sends data, and waits for reply => but for data acquisition, it is FPGA that is producing the data, CPU has no prior knowledge which packet will be processed at the time • DMA is operating on physical addresses: virtual addresses need to be pinned by kernel (so are not swapped and moved) Þ need to maintain own driver Þ address translation cache possible on FPGA, but requires memory PCI Express DMA Page 22 Xilinx QDMA is a robust but highly complex solution for PCI Express – used to interface FPGAs with x86 AMD and Intel CPUs
  • 23. • IBM POWER9 showed great numbers for I/O and memory throughput in Summit and Sierra supercomputers • IBM designed own memory coherent interface for accelerators (CAPI/OpenCAPI), which has advantages over PCIe POWER architecture Page 23 Source: Wikipedia
  • 25. OpenCAPI Page 25 FPGA board POWER9 CPU OpenCAPI cable • Predecessor CAPI => proprietary IBM • Communication over PCIe physical lines (but different protocol) • OpenCAPI => consortium model • Dedicated cabling (8 x 25 Gbit/s lines) • For POWER10 – this will be default memory interface, (allowing to have any type of memory attached to CPU + to “share” memory over network)
  • 26. • Similar difference what 80286/80386 virtual mode brought to software development • In OpenCAPI one needs single kernel operation => Attach accelerator to running process • Then, accelerator has access to virtual address space of running process – it is FPGA that is initiating the communication => Address translation is handled by TLB and OS => FPGA sees memory in a fully cache-coherent way • All security/reliability/efficiency mechanisms in CPU and kernel are also present in OpenCAPI Page 26 What difference brings OpenCAPI? Source: Wikipedia
  • 27. • Main function for the action contains a pointer to virutal address space - On device the pointer will be synthesized as 1024-bit master memory-mapped AXI interface - On CPU this pointer has to be just set to zero (which is first address of virtual address space) • Any cell in virtual memory is just accessed as offset from this pointer • Only requirement is that memory is aligned to 128-bytes - No special memory allocator, malloc or mmap is fine - No pinning/registering • The same memory buffer class for both simulation and working with device • For configuration, there is also 4 MiB memory-maped I/O space (like BAR in PCIe) - On device implemented as slave AXI-lite (32-bit) How to develop with OpenCAPI? Page 27
  • 28. • Open source “shell” mantained by IBM • http://github.com/OpenCAPI/oc-accel • Provides ready made tool to work with OpenCAPI (from transceiver setup to AXImm bridge) • Provides preconfigured interfaces for I/O peripherals (HBM, 100G, NVMe) • Provides simulation environment - One can simulate both SW and HW in a single simulation (both user FPGA design and software are not modified from their “real” implementation) OC-Accel Page 28
  • 29. WIR SCHAFFEN WISSEN – HEUTE FÜR MORGEN Jungfraujoch – FPGA implementation Page 29
  • 30. Page 30 Jungfraujoch server Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer FPGA board with OpenCAPI interface - Data acquisition - Initial data analysis - Pre-compression (2.5 Mpixel/board for JF) Up to 50 GB/s acquisition and data analysis in a single 2U IBM POWER9 server with 1-4 FPGA boards Frame summation
  • 31. Page 31 Jungfraujoch FGPA streaming design Modular design • Stream of data handled by successive cores doing work in parallel à throughput and latency of each core is determined by the hardware design • Extra stages can be relatively simply added, option to bypass cores • All cores are C++ functions, connected with AXI-Stream FIFOs • As buffering is expensive on FPGA, it is best suited for algorithm that have limited dependencies between frames Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 32. Page 32 Jungfraujoch Ethernet UDP/IP core Processes ethernet packets from network, ignores unnecessary packets, reads frame header to get frame number, module number, etc. Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 33. Page 33 Jungfraujoch Dark current core This cores is responsible for calculating moving average of detector frames. Calculated value is used as dark current (pedestal) for subsequent frames. Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 34. Page 34 Jungfraujoch Conversion core This cores translates JUNGFRAU read-out into units of energy or photon counts. It benefits from very fast HBM2 memory within the FPGA (460 GB/s). Data leaving this core can be used for processing by data analysis software. Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 35. Page 35 Jungfraujoch Frame summation core (work in progress) As data that left gain correction core are on linear scale, they can be summed to reduce downstream data rate, if lower frame rate is needed, as compared to detector. Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 36. Page 36 Jungfraujoch Strong pixel finder core This is first step of spot finding algorithm (for example COLSPOT). It identifies pixels that are stronger than given number of standard deviations of their neighborhood. Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 37. Page 37 Jungfraujoch Bitshuffle FPGAs are bit order agnostic. Therefore exchanging bit order in popular compression prefilter is pretty much for free on FPGA. Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 38. Page 38 Jungfraujoch Host memory write Address in host memory buffer is calculated and data forwarded to host memory via OpenCAPI. Additional image statistics are saved as well. Ethernet UDP/IP Dark current Conversion Strong pixel finder Bitshuffle Memory writer Frame summation
  • 39. Page 39 Jungfraujoch implementation on VU33P FPGA Spot finding HBM Gain Pedestal Write data OpenCAPI 100G UDP
  • 40. Jungfraujoch FPGA power usage is 18 W/board for the whole streaming functionality Page 40 Xilinx Vivado Power Report 2 boards for 4 Mpixel JUNGFRAU and 4 boards for 10 Mpixel JUNGFRAU
  • 41. • VU33P or VU35P with 8 GB of HBM2 • OpenCAPI link and PCIe Gen3 x16 (or two PCIe Gen4 x8) • Small flash (2 kb) to store MAC address, board IR • QSFP-DD optical socket (same as QSFP28, but with 8 lanes for 2x100G) => compatible with QSFP28 transceivers • Up to 75W Alpha Data 9H3 board Page 41
  • 42. • Software tests – Catch2 - 8 min - Among other software tests includes 13 FPGA action tests (whole SLS code) - Automated tests cover 95% lines of high- level synthesis code - Covers most of the functionality correctness – including address calculation - Main limitation is debugging of FIFOs parallel behavior (deadlocks, etc.) • Hardware simulation – Cadence Xcelium - 4 hours - Collection of 8 frames from single module - Checks if hardware description is correct, can find problems with synchronization, and other, very rare, issues - Too slow to verify functionality OpenCAPI programming - testing Page 42
  • 43. • Detector and data acquisition system was sent in November for an experiment in Photon Factory, KEK • More than 2,000 datasets collected for protein targets, few real-life native-SAD structures solved • Due to pandemic, detector support and development (including deployment of new FPGA design) was done fully remotely from Switzerland Commissioning in KEK (Jan – May 2021) Page 43 BL-1A Photon Factory JUNGFRAU detector (up) tested in helium chamber for native-SAD measurements with 3.75 keV X-rays
  • 44. Page 44 Structure of Nucleocapsid Phosphoprotein from SARS-CoV-2 solved in 1 second • Crystal was previously measured with conventional setup at our beamline – with measurement taking longer than one minute • With JUNGFRAU detector and OpenCAPI readout, 2000 images collected in one second allowed to solve structure of this protein • Experimental team: Filip Leonarski, Sylvain Engilberge, Vincent Olieric, Meitian Wang (MX Group), Aldo Mozzanica (PSI Detector Group) • SARS-CoV-2 protein was produced by Zinzula, L., Basquin, J., Bracher, A., Baumeister, W. (MPI, Martinsried)
  • 45. Possible gain from using FPGA based system Page 45 Courtesy: B. Mesnet (IBM)
  • 46. Possible gain from using FPGA based system Page 46 Courtesy: B. Mesnet (IBM)
  • 47. MX Group (PSI) • Vincent Olieric • Takashi Tomizaki • Chia-Ying Huang • Sylvain Engilberg • Justyna Wojdyła • Meitian Wang Detector Group (PSI) • Aldo Mozzanica • Martin Brückner • Carlos Lopez-Cuenca • Bernd Schmitt Science IT (PSI) • Leonardo Sala Controls (PSI) • Andrej Babic • Leonardo Hax-Damiani SLS management (PSI) • Oliver Bunk Photon Factory, KEK • Naohiro Matsugaki • Yusuke Yamada • Masahide Hikita MAX IV • Jie Nan • Zdenek Matej Uni Konstanz • Kay Diederichs LBL • Aaron Brewster DLS • Graeme Winter • DIALS Team ESRF • Jerome Kieffer IBM Systems (France) • Alexandre Castellane • Bruno Mesnet InnoBoost SA • Lionel Clavien Acknowledgements Page 47