SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
HPC and CFD at
EDF with
Code_Saturne
Yvan Fournier, Jérôme Bonelle

EDF R&D
Fluid Dynamics, Power Generation and Environment Department


Open Source CFD International Conference Barcelona 2009
Summary


                                                    1. General Elements on Code_Saturne
                                                    2. Real-world performance of Code_Saturne
                                                    3. Example applications: fuel assemblies
                                                    4. Parallel implementation of Code_Saturne
                                                    5. Ongoing work and future directions




2   Open Source CFD International Conference 2009
General elements on
                                                    Code_Saturne




3   Open Source CFD International Conference 2009
Code_Saturne: main capabilities
Physical modelling
      Single-phase laminar and turbulent flows: k-ε, k-ω SST, v2f, RSM, LES
      Radiative heat transfer (DOM, P-1)
      Combustion coal, gas, heavy fuel oil (EBU, pdf, LWP)
      Electric arc and Joule effect
      Lagrangian module for dispersed particle tracking
      Atmospheric flows (aka Mercure_Saturne)
      Specific engineering module for cooling towers
      ALE method for deformable meshes
      Conjugate heat transfer (SYRTHES & 1D)
      Common structure with NEPTUNE_CFD for eulerian multiphase flows


Flexibility
      Portability (UNIX, Linux and MacOS X)
      Standalone GUI and integrated in SALOME platform
      Parallel on distributed memory machines
      Periodic boundaries (parallel, arbitrary interfaces)
      Wide range of unstructured meshes with arbitrary interfaces
      Code coupling capabilities (Code_Saturne/Code_Saturne, Code_Saturne/Code_Aster, ...)


4      Open Source CFD International Conference 2009
Code_Saturne: general features
    Technology
     Co-located finite volume, arbitrary unstructured meshes (polyhdral cells), predictor-corrector method
     500 000 lines of code, 50% FORTRAN 90, 40% C, 10% Python
    Development
     1998: Prototype (long time EDF in-house experience, ESTET-ASTRID, N3S, ...)
     2000: version 1.0 (basic modelling, wide range of meshes)
     2001: Qualification for single phase nuclear thermal-hydraulic applications
     2004: Version 1.1 (complex physics, LES, parallel computing)
     2006: Version 1.2 (state of the art turbulence models, GUI)
     2008: Version 1.3 (massively parallel, ALE, code coupling, ...)
           Released as open source (GPL licence)
     2008: Dvpt version 1.4 (parallel IO, multigrid, atmospheric, cooling towers, ...)
     2009: Dvp version 2.0-beta (parallel mesh joining, code coupling, easy install & packaging, extended
     GUI)
              schedule for industrial release beginning of 2010

    Code_Saturne developed under Quality Assurance


5         Open Source CFD International Conference 2009
External libraries (EDF, LGPL):
                                                              • BFT: Base Functions and Types
Code_Saturne subsystems                                       • FVM: Finite Volume Mesh
                                                              • MEI: Mathematical Expression Interpreter


                                   Code_Saturne                       BFT library

Meshes                              Preprocessor
                                                                 Run-time environment
                                    mesh import                    Memory logging
                                    mesh joining
                                     periodicity
                                domain partitioning
                                                              FVM                  code coupling
Restart                                                     library
 files                                                                          parallel treatment
                                Parallel Kernel            parallel mesh          Code_Saturne
                                                           management
                                                                                   SYRTHES
                                Parallel mesh setup                                Code_Aster
                                    CFD Solver                                  SALOME platform
                                                                                         ...
    MEI library
    Mathematical                                                                       Post-
     Expression                     Xml data file
                                                                                    processing
     Interpreter
                                        GUI                                           output

6          Open Source CFD International Conference 2009
Code_Saturne environment
    Graphical User Interface
     setting up of calculation parameters
     parameter stored in Xml file
     interactive launch of calculations
     some specific physics not yet covered by GUI
     advanced setting up by Fortran user routines




                                                  Integration in the SALOME platform
                                                        extension of GUI capabilities
                                                        mouse selection of boundary zones
                                                        advanced user files management
                                                        from CAD to post-processing in one tool




7       Open Source CFD International Conference 2009
Allowable mesh examples




                                      Example of mesh with
                                      stretched cells and hanging nodes

                PWR lower
                                                           Example of composite mesh
                  plenum




                                     3D
                                     polyhedral cells

8   Open Source CFD International Conference 2009
Joining of non-conforming meshes
    Arbitrary interfaces
     Meshes may be contained in one single file or in several separate files, in any order
     Arbitrary interfaces can be selected by mesh references
     Caution must be exercised if arbitrary interfaces are used:
         in critical regions, or with LES
         with very different mesh refinements, or on curved CAD surfaces
     Often used in ways detrimental to mesh quality,but a functionality we can not do without
     as long as we do not have a proven alternative.
       Joining of meshes built in several pieces may also be used to circumvent meshing tool memory limitations.
     Periodicity is also constructed as an extension of mesh joining.




9         Open Source CFD International Conference 2009
Real-world performance
                                                     of Code_Saturne




10   Open Source CFD International Conference 2009
Code_Saturne Features of note to HPC

     Segregated solver
      All variables are solved or independently, coupling terms are explicit
       Diagonal-preconditioned CG used for pressure equation, Jacobi (or bi-CGstab) used for other variables
      More important, matrices have no block structure, and are very sparse
       Typically 7 non-zeroes per row for hexahedra, 5 for tetrahedra
       Indirect addressing + no dense blocs means less opportunities for MatVec optimization, as
       memory bandwidth is as important as peak flops.


     Linear equation solvers usually amount to 80% of CPU cost
     (dominated by pressure), gradient reconstruction about 20%
      The larger the mesh, the higher the relative cost of the pressure step




11         Open Source CFD International Conference 2009
Current performance (1/3)

     2 LES test cases (most I/O factored out)
                       1 M cells: (n_cells_min + n_cells_max)/2 = 880 at 1024 cores, 109 at
                       8192
                       10 M cells (n_cells_min + n_cells_max)/2 = 9345 at 1024 cores, 1150
                       at 8192
                                                          FATHER
                                                                                                                                                      HYP P I
                                           1 M hexahedra LES tes t cas e
                                                                                                                                        10 M h e xa h e d ra LES te st c a se
                       100000                                                                                             500000




                        10000                                                                                              50000
                                                                                   O pteron + infiniband
                                                                                   O pteron + Myrinet                                                                            Opte ron + Myrine t
                                                                                   NovaS cale                                                                                    Opte ron + infiniba nd
     Ela ps e d time




                                                                                   Blue G ene/L                                                                                  Nova S c a le


                                                                                                           Elapsed tile
                                                                                                                                                                                 Blue Ge ne /L



                         1000                                                                                               5000




                          100                                                                                                500
                                1     10             100           1000    10000                                                   1   10         100        1000        10000
                                              n c ore s
                                                                                                                                            n cores




12                              Open Source CFD International Conference 2009
Current performance (2/3)

                                        RANS, 100 M tetrahedra + polyhedra (most I/O factored out)
                                         Polyhedra due to mesh joinings may lead to higher load imbalance in
                                         local MatVec for large core counts
                                            96286/102242 min/max cells/core at 1024 cores
                                            11344/12781 min/max cells cores at 8192 cores


                                                                                 FA Grid
                                                                        RANS te s t ca s e
                                        10000
     Ela pse d tim e pe r ite ra tion




                                         1000
                                                                                                     Nov a S c a le
                                                                                                     Blue Ge ne /L (C O)
                                                                                                     Blue Ge ne /L (VN)

                                          100




                                           10
                                                100                      1000                10000
                                                                    n c o re s




13                                                    Open Source CFD International Conference 2009
Current performance (3/3)

     Efficiency often goes through an optimum (due du better cache hit
     rates) before dropping (due to latency induced by parallel
     synchronization)
                  Example shown here: HYPI (10 M cell LES test case)

                  1,6



                  1,4



                  1,2



                   1
     Efficacité




                  0,8



                  0,6



                  0,4



                  0,2



                   0
                        1              10                        100                          1000   10000
                                                         Number of MPI ranks

                                            Cluster Chatou    Tantale    Platine   BlueGene




14                          Open Source CFD International Conference 2009
High Performance Computing with Code_Saturne

     Code_Saturne used extensively on HPC machines
      in-house EDF clusters
      CCRT calculation centre (CEA based)
      EDF IBM BlueGene machines (8 000 and 32 000 cores)
      Run also on MareNostrum (Barcelona Computing Centre),
      Cray XT, …


     Code_Saturne used as reference in PRACE European project
      reference code for CFD benchmarks on 6 large european HPC centres
      Code_Saturne obtained “gold medal” status in scalability by Daresbury Laboratory (UK,
      HPCx) machine)




15        Open Source CFD International Conference 2009
Example HPC
                                                     applications:
                                                     fuel assemblies




16   Open Source CFD International Conference 2009
Fuel Assembly Studies

     Conflicting design goals
      Good thermal mixing properties, requiring tubulent flow
      Limit head loss
      Limit vibrations
      Fuel rods held by dimples and springs, and not welded,
      as they lengthen slightly over the years due to irradiation


     Complex core geometry
      Circa 150 to 250 fuel assemblies per core depending
      on reactor type, 8 to 10 grids per fuel assembly,
      17x17 grid (mostly fuel rods, 24 guide tubes)
      Geometry almost periodic, except for mix of several fuel assembly types in a given core (reload by 1/3
      or 1/4)
      Inlet an wall conditions not periodic, heat production not uniform at fine scale


     Why we study these flows
      Deformation may lead to difficulties in core unload/reload
      Turbulent-induced vibrations of fuel assemblies in PWR power plants is a potential cause of
      deformation and of fretting wear damage
      These may lead to weeks or months of interruption of operations


17         Open Source CFD International Conference 2009
Prototype FA calculation with Code_Saturne

     PWR nuclear reactor mixing grid mock-up (5x5)
      100 million cells
      calculation run on 4 000
      to 8 000 cores
      Main issue is mesh generation




18        Open Source CFD International Conference 2009
LES simulation of reduced FA domain

     Particular features for LES
       SIMPLEC algorithm with Rhie and Chow
       interpolation
       2nd order in time (Crank-Nicolson and Adams-
       Bashforth)
       2nd order in space (fully centered and sub-
       iterations for non-orthogonal faces)
       Fully hexahedral mesh, 8 million cells
     Boundary Conditions
       Implicit periodicity in x and y directions
       Constant inlet conditions
       Wall function when needed
       Free outlet
     Simulation
       1 million time-steps: 40 flow passes, 20 flow
       passes for averaging (no homogeneous
       direction)
       CFLmax= 0.8 (dt=5.10-6s)
       BlueGene/L system, 1024 processors
       Per time-step: 5s
       For 100 000 time-steps: 1week


19        Open Source CFD International Conference 2009
Parallel implementation
                                                     of Code_Saturne




20   Open Source CFD International Conference 2009
Base parallel operations (1/4)
     Distributed memory parallelism using domain partitioning
      Use classical “ghost cell” method for both parallelism and periodicity
       Most operations require only ghost cells sharing faces
       Extended neighborhoods for gradients also require ghost cells sharing vertices




       Global reductions (dot products) are also used, especially by the preconditioned
       conjugate gradient algorithm

     Periodicity uses the same mechanism
      Vector and tensor
      rotation also required


21       Open Source CFD International Conference 2009
Base parallel operations (/)

     Use of global numbering
      We associate a global number to each mesh entity
       A specific C type (fvm_gnum_t) is used for this. Currently an unsigned integer
       (usually 32-bit), but an unsigned long integer (64-bit) will be necessary
         Face-cell connectivity for hexahedral cells : size 4.n_faces, and
         n_faces about 3.n_cells, → size around 12.n_cells, so numbers
         requiring 64 bit around 350 million cells.
       Currently equal to the initial (pre-partitioning) number
      Allows for partition-independent single-image files
       Essential for restart files, also used for postprocessor output
       Also used for legacy coupling where matches can be saved




22       Open Source CFD International Conference 2009
Base parallel operations (/)

     Use of global numbering
      Redistribution on n blocks
       n blocks ≤ n cores
       Minimum block size may be set to avoid many small
       blocks (for some communication or usage schemes),
       or to force 1 block (for I/O with non-parallel libraries)
         In the future, using at most 1 of every p
         processors may improve MPI/IO performance if
         we use a smaller communicator (to be tested)




23       Open Source CFD International Conference 2009
Base parallel operations (/)

     Conversely, simply using global numbers allows reconstructing neighbor partition
     entity equivalents mapping
       Used for parallel ghost cell construction from
       initially partitioned mesh with no ghost data
     Arbitrary distribution, inefficient for halo
     exchange, but allow for simpler data
     structure related algorithms with
     deterministic performance bounds
       Owning processor determined simply by
       global number, messages are aggregated




24     Open Source CFD International Conference 2009
Parallel IO (1/2)

     We prefer using single (partition independent) files
      Easily run different stages or restarts of a calculation on different machines or queues
      Avoids having thousands or tens of thousands of files in a directory
      Better transparency of parallelism for the user

     Use MPI I/O when available
      Uses block to partition exchange when reading, partition to block when writing
       Use of indexed datatypes may be tested in the future, but will not be possible everywhere
      Used for reading of preprocessor and partitioner output, as well as for restart
      files
       These files use a unified binary format, consisting of an simple header an a succession of
       sections
         MPI IO pattern is thus a succession of global reads (or local read + broadcast) for
         section headers and collective reading of data (with a different portion for each rank)
         We could switch to HDF5 but preferred a lighter model and also avoid an extra
         dependency or dependency conflicts
      Infrastructure in progress for postprocessor output
       Layered approach as we allow for multiple formats

25        Open Source CFD International Conference 2009
Parallel IO (2/2)

     Parallel I/O only of benefit with parallel filesystems
      Use of MPI IO may be disabled either at build time, or for a given file using
      specific hints
      Without MPI IO, data for each block is written or read successively by rank 0,
      using the same FVM file API




     Not much feedback yet, but initial results
     dissapointing
      Similar performance with and without MPI IO on at least 2 systems
       Whether using MPI_File_read/write_at_all or MPI_File_read/write_all
       Need to retest this forcing less processors in the MPI IO communicator
      Bugs encountered in several MPI/IO implementations
26       Open Source CFD International Conference 2009
Ongoing work and
                                                     future directions




27   Open Source CFD International Conference 2009
Parallelization of mesh joining (2008-2009)

     Parallelizing this algorithm requires the same main steps as the serial
     algorithm:
      Detect intersections (within a given tolerance) between edges of overlapping faces
        Uses parallel octree for face bounding boxes, built in a bottom-up fashion (no balance
        condition required)
      Subdivide edges according to inserted intersection vertices
      Merge coincident or nearly-coincident vertices/intersections
        This is the most complex
         Must be synchronized in parallel
         Choice of merging criteria has a profound impact on the quality of the resulting mesh
      Re-build sub-faces

     With parallel mesh joining, the most memory-intensive serial
     preprocessing step is removed
      We will add parallel mesh « append » within a few months (for version 2.1);
      this will allow generation of huge meshes even with serial meshing tools




28        Open Source CFD International Conference 2009
Coupling of Code_Saturne with itself
     Objective
      coupling of different models (RANS/LES)
      fluid-structure interaction with large displacements
      rotating machines
     Two kinds of communications
      data exchange at boundaries for interface coupling
      volume forcing for overlapping regions
     Still under development, but ...
      data exchange already implemented in FVM library
         optimised localisation algorithm
         compliance with parallel/parallel coupling
      prototype versions with promising results
         more work needed on conservativity at the exchange
      first version adapted to pump modelling implemented in version 2.0
         rotor/stator coupling
         compares favourably with CFX




29         Open Source CFD International Conference 2009
Multigrid

     Currently, multigrid coarsening does not cross processor
     boundaries
      This implies that on p processors, the coarsest matrix may not contain less
      than p cells
      With a high processor count, less grid levels will be used, and solving for the
      coarsest matrix may be significantly more expensive than with a low processor
      count
       This reduces scalability, and may be checked (if suspected) using the solver summary info at
       the end of the log file


     Planned solution: move grids to nearest rank multiple of 4 or
     8 when mean local grid size is too small
      The communication pattern is not expected to change too much, as partitioning
      is of a recursive nature, and should already exhibit a “multigrid” nature
      This may be less optimal than repartitioning at each level, but setup time
      should also remain much cheaper
        Important, as grids may be rebuilt each time step
30       Open Source CFD International Conference 2009
Partitioning
     We currently use METIS or SCOTCH, but should move to
     ParMETIS or Pt-SCOTCH within a few months
      The current infrastructure makes this quite easy
     We have recently added a « backup » partitioning based on
     space-filling curves
      We currently use the Z curve (from our octree construction for parallel joining), but the
      appropriate changes in the coordinate comparison rules should allow switching to a
      Hilbert curve (reputed to lead to better partitioning)
      This is fully parallel and deterministic
      Performance on initial tests is
      about 20% worse on a single
      10-million cell case on 256 processes
       reasonable compared to
       unoptimized partitioning




31        Open Source CFD International Conference 2009
Tool chain evolution
     Code_Saturne V1.3 (current production version) added many
     HPC-oriented improvements compared to prior versions:
      Post-processor output handled by FVM / Kernel
      Ghost cell construction handled by FVM / Kernel
        Up to 40% gain in preprocessor memory peak compared to V1.2
        Parallelized and scales (manages 2 ghost cell sets and multiple periodicities)
      Well adapted up to 150 million cells (with 64 Gb for preprocessing)
        All fundamental limitations are pre-processing related


                         Pre-Processor:              Kernel + FVM:             Post-
          Meshes                                                            processing
                         serial run                  distributed run
                                                                              output


     Version 2.0 separates partitioning from preprocessing
      Also reduces their memory footprint a bit, moving newly parallelized operations
      to the kernel

                   Pre-Processor:             Partitioner:       Kernel + FVM:              Post-
      Meshes                                                                             processing
                   serial run                 serial run         distributed run           output

32        Open Source CFD International Conference 2009
Future direction: Hybrid MPI / OpenMP (1/2)

     Currently, a pure MPI model is used:
      Everything is parallel, synchronization is explicit when required
     On multiprocessor / multicore nodes, shared memory
     parallelism could also be used (using OpenMP directives)
      Parallel sections must be marked, and parallel loops must avoid
      modifying the same values
       Specific numberings must be used, similar to those used for
       vectorization, but with different constraints:
         Avoid false sharing, keep locality to limit cache misses




33       Open Source CFD International Conference 2009
Future direction: Hybrid MPI / OpenMP (2/2)

     Hybrid MPI / OpenMP is being tested
      IBM is testing this on Blue Gene/P
      Requires work on renumbering algorithms
      OpenMP parallelism would ease of packaging / installation on workstations
         No dependencies on source but not binary-compatible MPI library choices,
         only on the compiler runtime
         Good enough for current multicore workstations
          Coupling the code with itself or with with SYRTHES 4 will still require MPI
     The main goal is to allow MPI communicators of “only”
     10000’s of ranks on machines with 100000 cores
      Performance benefits expected mainly at the very high end
      Reduce risk of medium-term issues with MPI_Alltoallv used in I/O and parallelism-related
      data redistribution
       Though sparse collective algorithms is the long term solution for this specific issue




34        Open Source CFD International Conference 2009
Code_Saturne HPC roadmap
            2003                               2006                               2007                               2010                           2015
     Consecutive to the Civaux                                                                                                              The whole vessel
                                                                                                            9 fuel assemblies
     thermal fatigue event                                                                                                                  reactor

                                                                                                            No experimental approach up
     Computations enable to better                                                                          to now
     understand the wall thermal
     loading in an injection.                                                                               Will enable the study of side
                                                                                                            effects implied by the flow
     Knowing the root causes of the Computation with an                                                     around neighbour fuel
     event ⇒ define a new design to L.E.S. approach for                                                     assemblies.
     avoid this problem.            turbulent modelling
                                                                       Part of a fuel assembly              Better understanding of
                                      Refined mesh near the                                                 vibration phenomena and
                                                                       3 grid assemblies
                                      wall.                                                                 wear-out of the rods.




       106 cells                         107 cells                           108 cells                             109 cells                  1010 cells
       3.1013 operations                 6.1014 operations                   1016 operations                       3.1017 operations          5.1018 operations

      Fujistu VPP 5000                Cluster, IBM Power5                IBM Blue Gene/L « Frontier »         30 times the power of         500 times the power of
      1 of 4 vector processors        400 processors                     8000 processors                   IBM Blue Gene/L « Frontier » IBM Blue Gene/L « Frontier »
      2 month length computation      9 days                             # 1 month                                  # 1 month                     # 1 month

      # 1 Gb of storage               # 15 Gb of storage                 # 200 Gb of storage                  # 1 Tb of storage               # 10 Tb of storage
      2 Gb of memory                  25 Gb of memory                    250 Gb of memory                     2,5 Tb of memory                25 Tb of memory

      Power of the computer          Pre-processing not parallelized     Pre-processing not parallelized         … ibid. …                      … ibid. …
                                                                         Mesh generation                         … ibid. …                      … ibid. …
                                                                                                                 Scalability / Solver           … ibid. …
                                                                                                                                                Visualisation

35                 Open Source CFD International Conference 2009
Thank you for your attention!




36   Open Source CFD International Conference 2009
Additional Notes




37   Open Source CFD International Conference 2009
Load imbalance (1/3)

     In this example, using 8 partitions (with METIS), we
     have the following local minima and maxima:
      Cells:
      416 / 440 (6% imbalance)
      Cells + ghost cells:
      469/519 (11% imbalance)
      Interior faces:
      852/946 (11% imbalance)
     Most loops are on cells,
     but some are on cells + ghosts,
     and MatVec is in cells + faces


38       Open Source CFD International Conference 2009
Load imbalance (2/3)

     If load imbalance increases with processor count,
     scalability decreases

     If load imbalance reaches a high value (say 30% to
     50%) but does not increase, scalability is maintained,
     though some processor power is wasted
      Perfect balancing is impossible to reach, as different loops show
      different imbalance levels, an synchronizations may be required
      between these loops
       GCP uses MatVec and dot products
      Load imbalance might be reduced using weights for domain
      partitioning, with Cell weight = 1 + f(n_faces)


39       Open Source CFD International Conference 2009
Load imbalance (3/3)

     Another possible source of load imbalance is different
     cache miss rates on different ranks
      Difficult to estimate a priori
      With otherwise balanced loops, if a processor has a cache miss every
      300 instructions, and another a cache miss every 400 instructions,
      considering that the cost of a cache miss is at least 100 instructions,
      the corresponding imbalance reaches 20%




40       Open Source CFD International Conference 2009

Contenu connexe

Tendances

Ad hoc routing
Ad hoc routingAd hoc routing
Ad hoc routingits
 
BonFIRE TridentCom presentation
BonFIRE TridentCom presentationBonFIRE TridentCom presentation
BonFIRE TridentCom presentationBonFIRE
 
H.264 Library
H.264 LibraryH.264 Library
H.264 LibraryVideoguy
 
Resource to Performance Tradeoff Adjustment for Fine-Grained Architectures ─A...
Resource to Performance Tradeoff Adjustment for Fine-Grained Architectures ─A...Resource to Performance Tradeoff Adjustment for Fine-Grained Architectures ─A...
Resource to Performance Tradeoff Adjustment for Fine-Grained Architectures ─A...Fahad Cheema
 
DUNE on current and next generation HPC Platforms
DUNE on current and next generation HPC PlatformsDUNE on current and next generation HPC Platforms
DUNE on current and next generation HPC PlatformsMarkus Blatt
 
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Lviv Startup Club
 
Define location of Preplaced cells(http://www.vlsisystemdesign.com/PD-Flow.php)
Define location of Preplaced cells(http://www.vlsisystemdesign.com/PD-Flow.php)Define location of Preplaced cells(http://www.vlsisystemdesign.com/PD-Flow.php)
Define location of Preplaced cells(http://www.vlsisystemdesign.com/PD-Flow.php)VLSI SYSTEM Design
 
1 introduction to vlsi physical design
1 introduction to vlsi physical design1 introduction to vlsi physical design
1 introduction to vlsi physical designsasikun
 
Overview of the TriBITS Lifecycle Model
Overview of the TriBITS Lifecycle ModelOverview of the TriBITS Lifecycle Model
Overview of the TriBITS Lifecycle ModelSoftwarePractice
 
libHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservationlibHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservationSoftwarePractice
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Fisnik Kraja
 
Restructuring Campus CI -- UCSD-A LambdaCampus Research CI and the Quest for ...
Restructuring Campus CI -- UCSD-A LambdaCampus Research CI and the Quest for ...Restructuring Campus CI -- UCSD-A LambdaCampus Research CI and the Quest for ...
Restructuring Campus CI -- UCSD-A LambdaCampus Research CI and the Quest for ...Larry Smarr
 
Collaborative modeling and co simulation with destecs - a pilot study
Collaborative modeling and co simulation with destecs - a pilot studyCollaborative modeling and co simulation with destecs - a pilot study
Collaborative modeling and co simulation with destecs - a pilot studyDaniele Gianni
 
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Fisnik Kraja
 

Tendances (18)

Ad hoc routing
Ad hoc routingAd hoc routing
Ad hoc routing
 
Evaluation aodv
Evaluation aodvEvaluation aodv
Evaluation aodv
 
BonFIRE TridentCom presentation
BonFIRE TridentCom presentationBonFIRE TridentCom presentation
BonFIRE TridentCom presentation
 
H.264 Library
H.264 LibraryH.264 Library
H.264 Library
 
Resource to Performance Tradeoff Adjustment for Fine-Grained Architectures ─A...
Resource to Performance Tradeoff Adjustment for Fine-Grained Architectures ─A...Resource to Performance Tradeoff Adjustment for Fine-Grained Architectures ─A...
Resource to Performance Tradeoff Adjustment for Fine-Grained Architectures ─A...
 
DUNE on current and next generation HPC Platforms
DUNE on current and next generation HPC PlatformsDUNE on current and next generation HPC Platforms
DUNE on current and next generation HPC Platforms
 
Frame mode mpls
Frame mode mplsFrame mode mpls
Frame mode mpls
 
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
 
73
7373
73
 
Define location of Preplaced cells(http://www.vlsisystemdesign.com/PD-Flow.php)
Define location of Preplaced cells(http://www.vlsisystemdesign.com/PD-Flow.php)Define location of Preplaced cells(http://www.vlsisystemdesign.com/PD-Flow.php)
Define location of Preplaced cells(http://www.vlsisystemdesign.com/PD-Flow.php)
 
1 introduction to vlsi physical design
1 introduction to vlsi physical design1 introduction to vlsi physical design
1 introduction to vlsi physical design
 
Overview of the TriBITS Lifecycle Model
Overview of the TriBITS Lifecycle ModelOverview of the TriBITS Lifecycle Model
Overview of the TriBITS Lifecycle Model
 
libHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservationlibHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservation
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
 
Restructuring Campus CI -- UCSD-A LambdaCampus Research CI and the Quest for ...
Restructuring Campus CI -- UCSD-A LambdaCampus Research CI and the Quest for ...Restructuring Campus CI -- UCSD-A LambdaCampus Research CI and the Quest for ...
Restructuring Campus CI -- UCSD-A LambdaCampus Research CI and the Quest for ...
 
Gareth edwards xilinx
Gareth edwards xilinxGareth edwards xilinx
Gareth edwards xilinx
 
Collaborative modeling and co simulation with destecs - a pilot study
Collaborative modeling and co simulation with destecs - a pilot studyCollaborative modeling and co simulation with destecs - a pilot study
Collaborative modeling and co simulation with destecs - a pilot study
 
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
 

Similaire à Presentation of the open source CFD code Code_Saturne

QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)Heiko Joerg Schick
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingRuymán Reyes
 
Automating the Configuration of the FlexRay Communication Cycle
Automating the Configuration of the FlexRay Communication CycleAutomating the Configuration of the FlexRay Communication Cycle
Automating the Configuration of the FlexRay Communication CycleNicolas Navet
 
Close encounters in MDD: when Models meet Code
Close encounters in MDD: when Models meet CodeClose encounters in MDD: when Models meet Code
Close encounters in MDD: when Models meet Codelbergmans
 
Close Encounters in MDD: when models meet code
Close Encounters in MDD: when models meet codeClose Encounters in MDD: when models meet code
Close Encounters in MDD: when models meet codelbergmans
 
Developing Real-Time Systems on Application Processors
Developing Real-Time Systems on Application ProcessorsDeveloping Real-Time Systems on Application Processors
Developing Real-Time Systems on Application ProcessorsToradex
 
Migration of a computation cluster to Debian
Migration of a computation cluster to DebianMigration of a computation cluster to Debian
Migration of a computation cluster to DebianLogilab
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPAnil Bohare
 
A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...ChangWoo Min
 
Industrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric computeIndustrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric computePerry Lea
 
Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...
Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...
Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...Altair
 
Fugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedFugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedRCCSRENKEI
 
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Michelle Holley
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterTim Ellison
 
OpenStack and OpenFlow Demos
OpenStack and OpenFlow DemosOpenStack and OpenFlow Demos
OpenStack and OpenFlow DemosBrent Salisbury
 
Application scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system designApplication scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system designMr. Chanuwan
 

Similaire à Presentation of the open source CFD code Code_Saturne (20)

QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
 
Automating the Configuration of the FlexRay Communication Cycle
Automating the Configuration of the FlexRay Communication CycleAutomating the Configuration of the FlexRay Communication Cycle
Automating the Configuration of the FlexRay Communication Cycle
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
 
Close encounters in MDD: when Models meet Code
Close encounters in MDD: when Models meet CodeClose encounters in MDD: when Models meet Code
Close encounters in MDD: when Models meet Code
 
Close Encounters in MDD: when models meet code
Close Encounters in MDD: when models meet codeClose Encounters in MDD: when models meet code
Close Encounters in MDD: when models meet code
 
Developing Real-Time Systems on Application Processors
Developing Real-Time Systems on Application ProcessorsDeveloping Real-Time Systems on Application Processors
Developing Real-Time Systems on Application Processors
 
Migration of a computation cluster to Debian
Migration of a computation cluster to DebianMigration of a computation cluster to Debian
Migration of a computation cluster to Debian
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...
 
Industrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric computeIndustrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric compute
 
Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...
Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...
Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...
 
Fugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedFugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons Learned
 
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
 
43
4343
43
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
 
Report_Ines_Swayam
Report_Ines_SwayamReport_Ines_Swayam
Report_Ines_Swayam
 
OpenStack and OpenFlow Demos
OpenStack and OpenFlow DemosOpenStack and OpenFlow Demos
OpenStack and OpenFlow Demos
 
Application scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system designApplication scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system design
 
Concept of thread
Concept of threadConcept of thread
Concept of thread
 

Dernier

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Presentation of the open source CFD code Code_Saturne

  • 1. HPC and CFD at EDF with Code_Saturne Yvan Fournier, Jérôme Bonelle EDF R&D Fluid Dynamics, Power Generation and Environment Department Open Source CFD International Conference Barcelona 2009
  • 2. Summary 1. General Elements on Code_Saturne 2. Real-world performance of Code_Saturne 3. Example applications: fuel assemblies 4. Parallel implementation of Code_Saturne 5. Ongoing work and future directions 2 Open Source CFD International Conference 2009
  • 3. General elements on Code_Saturne 3 Open Source CFD International Conference 2009
  • 4. Code_Saturne: main capabilities Physical modelling Single-phase laminar and turbulent flows: k-ε, k-ω SST, v2f, RSM, LES Radiative heat transfer (DOM, P-1) Combustion coal, gas, heavy fuel oil (EBU, pdf, LWP) Electric arc and Joule effect Lagrangian module for dispersed particle tracking Atmospheric flows (aka Mercure_Saturne) Specific engineering module for cooling towers ALE method for deformable meshes Conjugate heat transfer (SYRTHES & 1D) Common structure with NEPTUNE_CFD for eulerian multiphase flows Flexibility Portability (UNIX, Linux and MacOS X) Standalone GUI and integrated in SALOME platform Parallel on distributed memory machines Periodic boundaries (parallel, arbitrary interfaces) Wide range of unstructured meshes with arbitrary interfaces Code coupling capabilities (Code_Saturne/Code_Saturne, Code_Saturne/Code_Aster, ...) 4 Open Source CFD International Conference 2009
  • 5. Code_Saturne: general features Technology Co-located finite volume, arbitrary unstructured meshes (polyhdral cells), predictor-corrector method 500 000 lines of code, 50% FORTRAN 90, 40% C, 10% Python Development 1998: Prototype (long time EDF in-house experience, ESTET-ASTRID, N3S, ...) 2000: version 1.0 (basic modelling, wide range of meshes) 2001: Qualification for single phase nuclear thermal-hydraulic applications 2004: Version 1.1 (complex physics, LES, parallel computing) 2006: Version 1.2 (state of the art turbulence models, GUI) 2008: Version 1.3 (massively parallel, ALE, code coupling, ...) Released as open source (GPL licence) 2008: Dvpt version 1.4 (parallel IO, multigrid, atmospheric, cooling towers, ...) 2009: Dvp version 2.0-beta (parallel mesh joining, code coupling, easy install & packaging, extended GUI) schedule for industrial release beginning of 2010 Code_Saturne developed under Quality Assurance 5 Open Source CFD International Conference 2009
  • 6. External libraries (EDF, LGPL): • BFT: Base Functions and Types Code_Saturne subsystems • FVM: Finite Volume Mesh • MEI: Mathematical Expression Interpreter Code_Saturne BFT library Meshes Preprocessor Run-time environment mesh import Memory logging mesh joining periodicity domain partitioning FVM code coupling Restart library files parallel treatment Parallel Kernel parallel mesh Code_Saturne management SYRTHES Parallel mesh setup Code_Aster CFD Solver SALOME platform ... MEI library Mathematical Post- Expression Xml data file processing Interpreter GUI output 6 Open Source CFD International Conference 2009
  • 7. Code_Saturne environment Graphical User Interface setting up of calculation parameters parameter stored in Xml file interactive launch of calculations some specific physics not yet covered by GUI advanced setting up by Fortran user routines Integration in the SALOME platform extension of GUI capabilities mouse selection of boundary zones advanced user files management from CAD to post-processing in one tool 7 Open Source CFD International Conference 2009
  • 8. Allowable mesh examples Example of mesh with stretched cells and hanging nodes PWR lower Example of composite mesh plenum 3D polyhedral cells 8 Open Source CFD International Conference 2009
  • 9. Joining of non-conforming meshes Arbitrary interfaces Meshes may be contained in one single file or in several separate files, in any order Arbitrary interfaces can be selected by mesh references Caution must be exercised if arbitrary interfaces are used: in critical regions, or with LES with very different mesh refinements, or on curved CAD surfaces Often used in ways detrimental to mesh quality,but a functionality we can not do without as long as we do not have a proven alternative. Joining of meshes built in several pieces may also be used to circumvent meshing tool memory limitations. Periodicity is also constructed as an extension of mesh joining. 9 Open Source CFD International Conference 2009
  • 10. Real-world performance of Code_Saturne 10 Open Source CFD International Conference 2009
  • 11. Code_Saturne Features of note to HPC Segregated solver All variables are solved or independently, coupling terms are explicit Diagonal-preconditioned CG used for pressure equation, Jacobi (or bi-CGstab) used for other variables More important, matrices have no block structure, and are very sparse Typically 7 non-zeroes per row for hexahedra, 5 for tetrahedra Indirect addressing + no dense blocs means less opportunities for MatVec optimization, as memory bandwidth is as important as peak flops. Linear equation solvers usually amount to 80% of CPU cost (dominated by pressure), gradient reconstruction about 20% The larger the mesh, the higher the relative cost of the pressure step 11 Open Source CFD International Conference 2009
  • 12. Current performance (1/3) 2 LES test cases (most I/O factored out) 1 M cells: (n_cells_min + n_cells_max)/2 = 880 at 1024 cores, 109 at 8192 10 M cells (n_cells_min + n_cells_max)/2 = 9345 at 1024 cores, 1150 at 8192 FATHER HYP P I 1 M hexahedra LES tes t cas e 10 M h e xa h e d ra LES te st c a se 100000 500000 10000 50000 O pteron + infiniband O pteron + Myrinet Opte ron + Myrine t NovaS cale Opte ron + infiniba nd Ela ps e d time Blue G ene/L Nova S c a le Elapsed tile Blue Ge ne /L 1000 5000 100 500 1 10 100 1000 10000 1 10 100 1000 10000 n c ore s n cores 12 Open Source CFD International Conference 2009
  • 13. Current performance (2/3) RANS, 100 M tetrahedra + polyhedra (most I/O factored out) Polyhedra due to mesh joinings may lead to higher load imbalance in local MatVec for large core counts 96286/102242 min/max cells/core at 1024 cores 11344/12781 min/max cells cores at 8192 cores FA Grid RANS te s t ca s e 10000 Ela pse d tim e pe r ite ra tion 1000 Nov a S c a le Blue Ge ne /L (C O) Blue Ge ne /L (VN) 100 10 100 1000 10000 n c o re s 13 Open Source CFD International Conference 2009
  • 14. Current performance (3/3) Efficiency often goes through an optimum (due du better cache hit rates) before dropping (due to latency induced by parallel synchronization) Example shown here: HYPI (10 M cell LES test case) 1,6 1,4 1,2 1 Efficacité 0,8 0,6 0,4 0,2 0 1 10 100 1000 10000 Number of MPI ranks Cluster Chatou Tantale Platine BlueGene 14 Open Source CFD International Conference 2009
  • 15. High Performance Computing with Code_Saturne Code_Saturne used extensively on HPC machines in-house EDF clusters CCRT calculation centre (CEA based) EDF IBM BlueGene machines (8 000 and 32 000 cores) Run also on MareNostrum (Barcelona Computing Centre), Cray XT, … Code_Saturne used as reference in PRACE European project reference code for CFD benchmarks on 6 large european HPC centres Code_Saturne obtained “gold medal” status in scalability by Daresbury Laboratory (UK, HPCx) machine) 15 Open Source CFD International Conference 2009
  • 16. Example HPC applications: fuel assemblies 16 Open Source CFD International Conference 2009
  • 17. Fuel Assembly Studies Conflicting design goals Good thermal mixing properties, requiring tubulent flow Limit head loss Limit vibrations Fuel rods held by dimples and springs, and not welded, as they lengthen slightly over the years due to irradiation Complex core geometry Circa 150 to 250 fuel assemblies per core depending on reactor type, 8 to 10 grids per fuel assembly, 17x17 grid (mostly fuel rods, 24 guide tubes) Geometry almost periodic, except for mix of several fuel assembly types in a given core (reload by 1/3 or 1/4) Inlet an wall conditions not periodic, heat production not uniform at fine scale Why we study these flows Deformation may lead to difficulties in core unload/reload Turbulent-induced vibrations of fuel assemblies in PWR power plants is a potential cause of deformation and of fretting wear damage These may lead to weeks or months of interruption of operations 17 Open Source CFD International Conference 2009
  • 18. Prototype FA calculation with Code_Saturne PWR nuclear reactor mixing grid mock-up (5x5) 100 million cells calculation run on 4 000 to 8 000 cores Main issue is mesh generation 18 Open Source CFD International Conference 2009
  • 19. LES simulation of reduced FA domain Particular features for LES SIMPLEC algorithm with Rhie and Chow interpolation 2nd order in time (Crank-Nicolson and Adams- Bashforth) 2nd order in space (fully centered and sub- iterations for non-orthogonal faces) Fully hexahedral mesh, 8 million cells Boundary Conditions Implicit periodicity in x and y directions Constant inlet conditions Wall function when needed Free outlet Simulation 1 million time-steps: 40 flow passes, 20 flow passes for averaging (no homogeneous direction) CFLmax= 0.8 (dt=5.10-6s) BlueGene/L system, 1024 processors Per time-step: 5s For 100 000 time-steps: 1week 19 Open Source CFD International Conference 2009
  • 20. Parallel implementation of Code_Saturne 20 Open Source CFD International Conference 2009
  • 21. Base parallel operations (1/4) Distributed memory parallelism using domain partitioning Use classical “ghost cell” method for both parallelism and periodicity Most operations require only ghost cells sharing faces Extended neighborhoods for gradients also require ghost cells sharing vertices Global reductions (dot products) are also used, especially by the preconditioned conjugate gradient algorithm Periodicity uses the same mechanism Vector and tensor rotation also required 21 Open Source CFD International Conference 2009
  • 22. Base parallel operations (/) Use of global numbering We associate a global number to each mesh entity A specific C type (fvm_gnum_t) is used for this. Currently an unsigned integer (usually 32-bit), but an unsigned long integer (64-bit) will be necessary Face-cell connectivity for hexahedral cells : size 4.n_faces, and n_faces about 3.n_cells, → size around 12.n_cells, so numbers requiring 64 bit around 350 million cells. Currently equal to the initial (pre-partitioning) number Allows for partition-independent single-image files Essential for restart files, also used for postprocessor output Also used for legacy coupling where matches can be saved 22 Open Source CFD International Conference 2009
  • 23. Base parallel operations (/) Use of global numbering Redistribution on n blocks n blocks ≤ n cores Minimum block size may be set to avoid many small blocks (for some communication or usage schemes), or to force 1 block (for I/O with non-parallel libraries) In the future, using at most 1 of every p processors may improve MPI/IO performance if we use a smaller communicator (to be tested) 23 Open Source CFD International Conference 2009
  • 24. Base parallel operations (/) Conversely, simply using global numbers allows reconstructing neighbor partition entity equivalents mapping Used for parallel ghost cell construction from initially partitioned mesh with no ghost data Arbitrary distribution, inefficient for halo exchange, but allow for simpler data structure related algorithms with deterministic performance bounds Owning processor determined simply by global number, messages are aggregated 24 Open Source CFD International Conference 2009
  • 25. Parallel IO (1/2) We prefer using single (partition independent) files Easily run different stages or restarts of a calculation on different machines or queues Avoids having thousands or tens of thousands of files in a directory Better transparency of parallelism for the user Use MPI I/O when available Uses block to partition exchange when reading, partition to block when writing Use of indexed datatypes may be tested in the future, but will not be possible everywhere Used for reading of preprocessor and partitioner output, as well as for restart files These files use a unified binary format, consisting of an simple header an a succession of sections MPI IO pattern is thus a succession of global reads (or local read + broadcast) for section headers and collective reading of data (with a different portion for each rank) We could switch to HDF5 but preferred a lighter model and also avoid an extra dependency or dependency conflicts Infrastructure in progress for postprocessor output Layered approach as we allow for multiple formats 25 Open Source CFD International Conference 2009
  • 26. Parallel IO (2/2) Parallel I/O only of benefit with parallel filesystems Use of MPI IO may be disabled either at build time, or for a given file using specific hints Without MPI IO, data for each block is written or read successively by rank 0, using the same FVM file API Not much feedback yet, but initial results dissapointing Similar performance with and without MPI IO on at least 2 systems Whether using MPI_File_read/write_at_all or MPI_File_read/write_all Need to retest this forcing less processors in the MPI IO communicator Bugs encountered in several MPI/IO implementations 26 Open Source CFD International Conference 2009
  • 27. Ongoing work and future directions 27 Open Source CFD International Conference 2009
  • 28. Parallelization of mesh joining (2008-2009) Parallelizing this algorithm requires the same main steps as the serial algorithm: Detect intersections (within a given tolerance) between edges of overlapping faces Uses parallel octree for face bounding boxes, built in a bottom-up fashion (no balance condition required) Subdivide edges according to inserted intersection vertices Merge coincident or nearly-coincident vertices/intersections This is the most complex Must be synchronized in parallel Choice of merging criteria has a profound impact on the quality of the resulting mesh Re-build sub-faces With parallel mesh joining, the most memory-intensive serial preprocessing step is removed We will add parallel mesh « append » within a few months (for version 2.1); this will allow generation of huge meshes even with serial meshing tools 28 Open Source CFD International Conference 2009
  • 29. Coupling of Code_Saturne with itself Objective coupling of different models (RANS/LES) fluid-structure interaction with large displacements rotating machines Two kinds of communications data exchange at boundaries for interface coupling volume forcing for overlapping regions Still under development, but ... data exchange already implemented in FVM library optimised localisation algorithm compliance with parallel/parallel coupling prototype versions with promising results more work needed on conservativity at the exchange first version adapted to pump modelling implemented in version 2.0 rotor/stator coupling compares favourably with CFX 29 Open Source CFD International Conference 2009
  • 30. Multigrid Currently, multigrid coarsening does not cross processor boundaries This implies that on p processors, the coarsest matrix may not contain less than p cells With a high processor count, less grid levels will be used, and solving for the coarsest matrix may be significantly more expensive than with a low processor count This reduces scalability, and may be checked (if suspected) using the solver summary info at the end of the log file Planned solution: move grids to nearest rank multiple of 4 or 8 when mean local grid size is too small The communication pattern is not expected to change too much, as partitioning is of a recursive nature, and should already exhibit a “multigrid” nature This may be less optimal than repartitioning at each level, but setup time should also remain much cheaper Important, as grids may be rebuilt each time step 30 Open Source CFD International Conference 2009
  • 31. Partitioning We currently use METIS or SCOTCH, but should move to ParMETIS or Pt-SCOTCH within a few months The current infrastructure makes this quite easy We have recently added a « backup » partitioning based on space-filling curves We currently use the Z curve (from our octree construction for parallel joining), but the appropriate changes in the coordinate comparison rules should allow switching to a Hilbert curve (reputed to lead to better partitioning) This is fully parallel and deterministic Performance on initial tests is about 20% worse on a single 10-million cell case on 256 processes reasonable compared to unoptimized partitioning 31 Open Source CFD International Conference 2009
  • 32. Tool chain evolution Code_Saturne V1.3 (current production version) added many HPC-oriented improvements compared to prior versions: Post-processor output handled by FVM / Kernel Ghost cell construction handled by FVM / Kernel Up to 40% gain in preprocessor memory peak compared to V1.2 Parallelized and scales (manages 2 ghost cell sets and multiple periodicities) Well adapted up to 150 million cells (with 64 Gb for preprocessing) All fundamental limitations are pre-processing related Pre-Processor: Kernel + FVM: Post- Meshes processing serial run distributed run output Version 2.0 separates partitioning from preprocessing Also reduces their memory footprint a bit, moving newly parallelized operations to the kernel Pre-Processor: Partitioner: Kernel + FVM: Post- Meshes processing serial run serial run distributed run output 32 Open Source CFD International Conference 2009
  • 33. Future direction: Hybrid MPI / OpenMP (1/2) Currently, a pure MPI model is used: Everything is parallel, synchronization is explicit when required On multiprocessor / multicore nodes, shared memory parallelism could also be used (using OpenMP directives) Parallel sections must be marked, and parallel loops must avoid modifying the same values Specific numberings must be used, similar to those used for vectorization, but with different constraints: Avoid false sharing, keep locality to limit cache misses 33 Open Source CFD International Conference 2009
  • 34. Future direction: Hybrid MPI / OpenMP (2/2) Hybrid MPI / OpenMP is being tested IBM is testing this on Blue Gene/P Requires work on renumbering algorithms OpenMP parallelism would ease of packaging / installation on workstations No dependencies on source but not binary-compatible MPI library choices, only on the compiler runtime Good enough for current multicore workstations Coupling the code with itself or with with SYRTHES 4 will still require MPI The main goal is to allow MPI communicators of “only” 10000’s of ranks on machines with 100000 cores Performance benefits expected mainly at the very high end Reduce risk of medium-term issues with MPI_Alltoallv used in I/O and parallelism-related data redistribution Though sparse collective algorithms is the long term solution for this specific issue 34 Open Source CFD International Conference 2009
  • 35. Code_Saturne HPC roadmap 2003 2006 2007 2010 2015 Consecutive to the Civaux The whole vessel 9 fuel assemblies thermal fatigue event reactor No experimental approach up Computations enable to better to now understand the wall thermal loading in an injection. Will enable the study of side effects implied by the flow Knowing the root causes of the Computation with an around neighbour fuel event ⇒ define a new design to L.E.S. approach for assemblies. avoid this problem. turbulent modelling Part of a fuel assembly Better understanding of Refined mesh near the vibration phenomena and 3 grid assemblies wall. wear-out of the rods. 106 cells 107 cells 108 cells 109 cells 1010 cells 3.1013 operations 6.1014 operations 1016 operations 3.1017 operations 5.1018 operations Fujistu VPP 5000 Cluster, IBM Power5 IBM Blue Gene/L « Frontier » 30 times the power of 500 times the power of 1 of 4 vector processors 400 processors 8000 processors IBM Blue Gene/L « Frontier » IBM Blue Gene/L « Frontier » 2 month length computation 9 days # 1 month # 1 month # 1 month # 1 Gb of storage # 15 Gb of storage # 200 Gb of storage # 1 Tb of storage # 10 Tb of storage 2 Gb of memory 25 Gb of memory 250 Gb of memory 2,5 Tb of memory 25 Tb of memory Power of the computer Pre-processing not parallelized Pre-processing not parallelized … ibid. … … ibid. … Mesh generation … ibid. … … ibid. … Scalability / Solver … ibid. … Visualisation 35 Open Source CFD International Conference 2009
  • 36. Thank you for your attention! 36 Open Source CFD International Conference 2009
  • 37. Additional Notes 37 Open Source CFD International Conference 2009
  • 38. Load imbalance (1/3) In this example, using 8 partitions (with METIS), we have the following local minima and maxima: Cells: 416 / 440 (6% imbalance) Cells + ghost cells: 469/519 (11% imbalance) Interior faces: 852/946 (11% imbalance) Most loops are on cells, but some are on cells + ghosts, and MatVec is in cells + faces 38 Open Source CFD International Conference 2009
  • 39. Load imbalance (2/3) If load imbalance increases with processor count, scalability decreases If load imbalance reaches a high value (say 30% to 50%) but does not increase, scalability is maintained, though some processor power is wasted Perfect balancing is impossible to reach, as different loops show different imbalance levels, an synchronizations may be required between these loops GCP uses MatVec and dot products Load imbalance might be reduced using weights for domain partitioning, with Cell weight = 1 + f(n_faces) 39 Open Source CFD International Conference 2009
  • 40. Load imbalance (3/3) Another possible source of load imbalance is different cache miss rates on different ranks Difficult to estimate a priori With otherwise balanced loops, if a processor has a cache miss every 300 instructions, and another a cache miss every 400 instructions, considering that the cost of a cache miss is at least 100 instructions, the corresponding imbalance reaches 20% 40 Open Source CFD International Conference 2009