SlideShare une entreprise Scribd logo
1  sur  33
Some things you need to know
Jongsu Kim
Fortran
Fortran….
• Still Fortran 77, 90, or 95?
• Fortran 2003 & 2008 is already here and 2015 will be a future.
• Some parts will be deleted or obsolescent.
• We are using Fortran wrong way.
What you shouldn’t use
Labeled Do Loops
do 100
ii=istart,ilast,istep
isum = isum + ii
100 continue
1 2 3 4 5 6 7
A
B
C(1) C(2)
EQUIVALENCE
specify the sharing of storage units by two or more objects
in a scoping unit
character (len=3) :: C(2)
character (len=4) :: A,B
equivalence (A,C(1)), (B,C(2))
COMMON
Blocks of physical storage accessed by any of
the scoping units in a program
COMMON /BLOCKA/ A,B,C(10,30)
COMMON I, J, K
ENTRY
subroutine-like-things Inside subroutine
FIXED FORM SOURCE
Fortran 77 style (80 column restriction)
CHARACTER* form
replaced with CHARACTER(LEN=?)
NON-BLOCK DO CONSTRUCT
the DO range doesn't end in a CONTINUE or
END DO
What you shouldn’t use
Labeled Do Loops
Label doesn’t need, hard to remember
what meaning of number. Moreover, we
have END DO or CYCLE statement
EQUIVALENCE
Equivalence is also error-prone. It is hard to
memorize all of positions where this variables
points.
Since COMMON and EQUIVALENCE is not to
encouraged to use, BLOCK statement is also not
to do.
COMMON
Sharing lots of variables over program is
dangerous. It is error-prone
ENTRY
It complicates program because we have
module & subroutine
NON-BLOCK DO CONSTRUCT
Hard to maintain where DO loop ends
What you might want to use – CYCLE , EXIT
• Avoid GOTO Statement
• Use CYCLE or EXIT statement
• CYCLE : Skip to the end of a loop
• EXIT : exit loop
do i=1, 100
x = real(i)
y = sin(x)
if (i == 20) exit
z = cos(x)
enddo
do i=1, 100
x = real(i)
y = sin(x)
if (i == 20) cycle
z = cos(x)
enddo
19 iteration will be done successfully, but at
20th iteration, y = sin(x) executed
then exit loop.
100 iteration, but at i=20, z = cos(x)
doesn’t executed
What you might want to use – CYCLE , EXIT
• Avoid GOTO statement
• Use CYCLE or EXIT statement with nested loop
• Constructs (DO, IF, CASE, etc.) may have names
outer: do j=1, 100
inner: do i=1, 100
x = real(i)
y = sin(x)
if (i > 20) exit outer
z = cos(x)
enddo inner
enddo outer
Exit whole loop at i=21 Skip z=cos(x) when i>21
outer: do j=1, 100
inner: do i=1, 100
x = real(i)
y = sin(x)
if (i > 20) cycle outer
z = cos(x)
enddo inner
enddo outer
What you might want to use – WHERE
real, dimension(4) :: &
x = [ -1, 0, 1, 2 ], &
a = [ 5, 6, 7, 8 ]
...
where (x < 0)
a = -1.
end where
where (x /= 0)
a = 1. / a
elsewhere
a = 0.
end where
where (x < 0)
a = -1.
end where
a : {-1.0, 6.0, 7.0, 8.0}
where (x /= 0)
a = 1. / a
elsewhere
a = 0.
end where
a : {-1.0, 0.0, 1.0/7.0, 1.0/8.0}
What you might want to use – ANY
integer, parameter :: n = 100
real, dimension(n,n) :: a, b, c1, c2
c1 = my_matmul(a, b) ! home-grown function
c2 = matmul(a, b) ! built-in function
if (any(abs(c1 - c2) > 1.e-4)) then
print *, ’There are significant
differences’
endif
• ANY and WHERE remove redundant do loop
What you might want to use – DO CONCURRENT
• Vectorization
• Simple example of Auto-Parallelization
• Definition : Processes one operation on multiple pairs of operands at once
do concurrent (i=1:m)
call dosomething()
end do
DO i=1,1024
C(i) = A(i) * B(i)
END DO
DO i=1,1024,4
C(i:i+3) = A(i:i+3) * B(i:i+3)
END DO
• ALLOW/REQUEST Vectorization. If you need vectorization, enable –parallel option.
• No data dependencies, No EXIT or CYCLE Statement, No return statement.
• Use with OpenMP.
For More..
• Read Fortran 2008 Standard
• http://www.j3-fortran.org/doc/year/10/10-007.pdf
• More recent document for Fortran 2015 (or more, working now)
• http://j3-fortran.org/doc/year/15/15-007.pdf
• Easy to read documents
• The new features of Fortran 2008 : ftp://ftp.nag.co.uk/sc22wg5/N1801-N1850/N1828.pdf
• Modern Programming Languages: Fortran90/95/2003/2008 :
https://www.tacc.utexas.edu/documents/13601/162125/fortran_class.pdf
Build System (MakeFile)
Build?
• Process From Source Code to Executable Files, so called Build.
• Compiler : tool for compile, Linker : tool for Link.
• ifort, gcc, gfortran, and so on are combined tool for compile & link.
Source Code1.f
Source Code2.f
Source Code3.f
Source Code1.o
Source Code2.o
Source Code3.o
Compile Link
Libraries(FFTW..)
Readable Unreadable
a.out
Makefile?
• make do all of compile & link jobs automatically. Makefile is a build script.
• make(actually gmake) is one of many tools. There are many tools like make, so called build
system.
• Visual studio has own build system. Hence it doesn’t use makefile.
$ gcc -o hellomake hellomake.c hellofunc.c -I.
hellomake: hellomake.c hellofunc.c
gcc -o hellomake hellomake.c hellofunc.c -I.
1. Command-line
2. Simple Makefile (1)
• “hellomake:” : rule name
• “hellomake.c hellofunc.c hellomake.h” : dependencies
• “gcc …” : actual command
• Simply “make” execute first rule defined in Makefile
Makefile Command-line
$ make or
$ make hellomake
Makefile?
CC=gcc
CFLAGS=-I.
hellomake: hellomake.o hellofunc.o
$(CC) -o hellomake hellomake.o hellofunc.o -I.
3. Simple Makefile (3)
Add constants
• “CC=gcc” : C Compiler
• “CFLAGS” : list of flags to pass to the compilation command
• For Fortran, “FC” instead of “CC”, “FFLAGS” instead of “CFLAGS”
• Indent(tab) with command line (“$(CC)”) is important!
$ make or
$ make hellomake
Makefile?
CC=gcc
CFLAGS=-I.
DEPS = hellomake.h
hellomake: hellomake.o hellofunc.o
$(CC) -o hellomake hellomake.o hellofunc.o -I.
%.o: %.c $(DEPS)
$(CC) -c $< $(CFLAGS)
4. Simple Makefile (4)
Automatically find .c files and make a rule for compilation(.o). $@ and $< are special macros in Makefile
• Rule %.o : rule for compilation, Rule hellomake : rule for link.
• $@ is the name of the file to be made. (e.g. hellomake for rule hellomake)
• $< The name of the first prerequisite. (hellomake.o is first prerequisite of rule hellomake)
• $^ The names of all the prerequisites, with spaces between them
• $* the prefix shared by target and dependent files (hellomake : $* of hellomake.c)
$ make or
$ make hellomake
Compiler & Linker Options
FFLAGS=-O3 -r8 -openmp -I /home/astromece/usr/fftw/include
LIBS=-L/home/astromeca/usr/lib -lfftw3 -lm
Compiler Options and Linker Options
• -O3 : Optimization Level (O1 : Code size optimization, O2 : General Optimization(Default), O3 : Aggressive
Optimization)
• -r8 : real type is a double precision (8byte(=64bit) for real)
• -I : Specify include directory. Include : .h files (declaration)
• -L : Specify library directory. Library files : .so or .a
• -lfftw3 : Link with fftw3 library
• -lm : link with math library (to use several math intrinsic functions)
Compiler & Linker Options
Recommend options
• -heap-arrays [numbers] : Puts automatic arrays and arrays above [numbers]KB created for temporary
computations on the heap instead of the stack. Same effect as allocate statement.
• -axcode [code] : Specify CPU architecture. DGIST, Boolt : AVX, CSE Server(OMP) : SSE4.1, CSE Server(SMP) :
SSE4.2
• -O2 : before enable –O3, compare results with -O2 and -O3 options. “Sometimes”, -O3 cause different results.
• -parallel : Enable auto parallelized code. turn on if you use DO CONCURRENT.
• -free : free-form source (f90 style), ifort automatically compile .f file as Fortran77. If you want to compile .f
suffix as Fortran 90 or higher, enable this option.
• $ man ifort gives us a lot of additional information.
Debug vs Release
• -g (to use debugger) or –check (check array bounds and son on) option help reducing errors, however, it adds
some additional code hence it slows code and turn off optimization automatically.
• If you are sure that you don’t have errors and want to get results, enable optimization but remove –g or –
check options.
MKL BLAS & CG Method
Intel MKL(Math Kernel Library) and BLAS
Intel MKL
• A library of optimized math routines for science, engineering, and financial applications.
• Basic functions related to matrix or vector included.
• You don’t need any installation, just add library.
BLAS
• Basic Linear Algebra Subprograms
• a set of low-level routines for performing common linear algebra operations such as vector addition, scalar
multiplication, dot products, linear combinations, and matrix multiplication
• It has same interface but has various implementations, ATLAS, MKL, OpenBLAS, GotoBLAS and so on.
• I will use MKL BLAS because it is easy to compile and well documentated.
• It already parallelized. Hence, just turn on an option make all parallelism without using OpenMP. (MPI
parallelism is not implemented).
I will show how to make CG method using MKL BLAS line by line.
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1
1
1
row offsets
column indices
values
9 entries (non zero entries)
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1
1 2
1 7
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3
1 2 2
1 7 2
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3
1 2 2 3
1 7 2 8
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5
1 2 2 3 1
1 7 2 8 5
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5
1 2 2 3 1 3
1 7 2 8 5 3
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5
1 2 2 3 1 3 4
1 7 2 8 5 3 9
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8
1 2 2 3 1 3 4 2
1 7 2 8 5 3 9 6
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8
1 2 2 3 1 3 4 2 4
1 7 2 8 5 3 9 6 4
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8 10
1 2 2 3 1 3 4 2 4
1 7 2 8 5 3 9 6 4
column indices
values
9 entries (non zero entries)
row offsets
Indicates end
Sparse matrix
• If construct A matrix with zeros, 16 * 8bytes is required
• Sparse matrix, CSR matrix, requires 23 * 8bytes.
• Inefficient? No, if you have large A matrix, such as 𝑛𝑥 ⋅ 𝑛𝑦 × (𝑛𝑥 ⋅ 𝑛𝑦), CSR is SOOOO efficient.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8 10
1 2 2 3 1 3 4 2 4
1 7 2 8 5 3 9 6 4
What BLAS Library Functions Required?
• mkl_dcsrgemv : Computes matrix - vector product of a sparse general matrix stored in the CSR format (3-
array variation) with zero-based indexing with double precision. used in 𝐴𝑥 computation.
• call mkl_dcsrgemv(transa, m, a, ia, ja, x, y)
• transa : determine 𝐴𝑥 (transa=‘N’ or ‘n’) or 𝐴’𝑥 (transa=‘T’ or ‘t’ or ‘C’ or ‘c’).
• m : # of rows of A
• a : Values array of A in CSR format
• ia : Row offset array of A in CSR format
• ja : Column indices array of A in CSR format
• x : x vector
• y : output (𝐴𝑥)
• dcopy : Copy vector (routines), copy arrays from x to y. 𝑦 = 𝑥
• call dcopy(n, x, y)
• n : # of elements in vectors 𝑥 and 𝑦.
• x : Input, 𝑥 vector
• y : Output, 𝑦 vector
What BLAS Library Functions Required?
• ddot : Computes a vector-vector dot product. 𝑥 ⋅ 𝑦
• not subroutine, it’s a function.
• dot(x, y)
• x, y : 𝑥, 𝑦 vector
• daxpy : Computes a vector-scalar product and adds the result to a vector. SAXPY : Single-precision A·X Plus Y
• 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦
• call daxpy(n, a, x, y)
• n : # of elements in vectors 𝑥 and 𝑦.
• A : Scalar A
• x : Input, 𝑥 vector
• y : Output, 𝑦 vector
• dnrm2 : Computes the Euclidean norm of a vector. 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦
• not subroutine, it’s a function
• nrm2(x)
• n : # of elements in vectors 𝑥.
• x : Input, 𝑥 vector

Contenu connexe

Tendances

Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimizationZongYing Lyu
 
Synthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrumSynthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrumHossam Hassan
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPA B Shinde
 
Presentation systemc
Presentation systemcPresentation systemc
Presentation systemcSUBRAHMANYA S
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016Ehsan Totoni
 
Instruction level parallelism
Instruction level parallelismInstruction level parallelism
Instruction level parallelismdeviyasharwin
 
Programmable logic device (PLD)
Programmable logic device (PLD)Programmable logic device (PLD)
Programmable logic device (PLD)Sɐɐp ɐɥɯǝp
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Christian Peel
 
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memoryNico Ludwig
 
Максим Харченко. Erlang lincx
Максим Харченко. Erlang lincxМаксим Харченко. Erlang lincx
Максим Харченко. Erlang lincxAlina Dolgikh
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryoguest40fc7cd
 
Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Deepak Kumar
 
Peephole optimization techniques in compiler design
Peephole optimization techniques in compiler designPeephole optimization techniques in compiler design
Peephole optimization techniques in compiler designAnul Chaudhary
 
09 implementing+subprograms
09 implementing+subprograms09 implementing+subprograms
09 implementing+subprogramsbaran19901990
 

Tendances (20)

Embedded system -Introduction to hardware designing
Embedded system  -Introduction to hardware designingEmbedded system  -Introduction to hardware designing
Embedded system -Introduction to hardware designing
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimization
 
Synthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrumSynthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrum
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
 
Presentation systemc
Presentation systemcPresentation systemc
Presentation systemc
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016
 
SoC FPGA Technology
SoC FPGA TechnologySoC FPGA Technology
SoC FPGA Technology
 
Instruction level parallelism
Instruction level parallelismInstruction level parallelism
Instruction level parallelism
 
Programmable logic device (PLD)
Programmable logic device (PLD)Programmable logic device (PLD)
Programmable logic device (PLD)
 
Open mp
Open mpOpen mp
Open mp
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
 
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory
 
Максим Харченко. Erlang lincx
Максим Харченко. Erlang lincxМаксим Харченко. Erlang lincx
Максим Харченко. Erlang lincx
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
 
Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)
 
Peephole optimization techniques in compiler design
Peephole optimization techniques in compiler designPeephole optimization techniques in compiler design
Peephole optimization techniques in compiler design
 
OpenMp
OpenMpOpenMp
OpenMp
 
09 implementing+subprograms
09 implementing+subprograms09 implementing+subprograms
09 implementing+subprograms
 
Matlab isim link
Matlab isim linkMatlab isim link
Matlab isim link
 
Openmp
OpenmpOpenmp
Openmp
 

En vedette

CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1Computational Materials Science Initiative
 
Vietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South KoreaVietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South KoreaJongsu "Liam" Kim
 
Stress Tensor & Rotation Tensor
Stress Tensor & Rotation TensorStress Tensor & Rotation Tensor
Stress Tensor & Rotation TensorJongsu "Liam" Kim
 
Level set method for droplet simulation
Level set method for droplet simulationLevel set method for droplet simulation
Level set method for droplet simulationJongsu "Liam" Kim
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementationJongsu "Liam" Kim
 
The MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACKThe MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACKMaho Nakata
 
Android Application: Introduction
Android Application: IntroductionAndroid Application: Introduction
Android Application: IntroductionJollen Chen
 
Intelligence, spies & espionage
Intelligence, spies & espionageIntelligence, spies & espionage
Intelligence, spies & espionagedgnadt
 
What is Network Security?
What is Network Security?What is Network Security?
What is Network Security?Faith Zeller
 
CITY OF SPIES BY SORAYYA KHAN
CITY OF SPIES BY SORAYYA KHANCITY OF SPIES BY SORAYYA KHAN
CITY OF SPIES BY SORAYYA KHANSheikh Hasnain
 

En vedette (20)

CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
 
Cubase subject introduction
Cubase subject introductionCubase subject introduction
Cubase subject introduction
 
History Against Against
History Against AgainstHistory Against Against
History Against Against
 
Vietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South KoreaVietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South Korea
 
Cubase1차발표
Cubase1차발표Cubase1차발표
Cubase1차발표
 
NAS EP Algorithm
NAS EP Algorithm NAS EP Algorithm
NAS EP Algorithm
 
Stress Tensor & Rotation Tensor
Stress Tensor & Rotation TensorStress Tensor & Rotation Tensor
Stress Tensor & Rotation Tensor
 
Level Set Method
Level Set MethodLevel Set Method
Level Set Method
 
Level set method for droplet simulation
Level set method for droplet simulationLevel set method for droplet simulation
Level set method for droplet simulation
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
 
The MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACKThe MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACK
 
SAN
SANSAN
SAN
 
Android Application: Introduction
Android Application: IntroductionAndroid Application: Introduction
Android Application: Introduction
 
Intelligence, spies & espionage
Intelligence, spies & espionageIntelligence, spies & espionage
Intelligence, spies & espionage
 
Carrick - Introduction to Physics & Electronics - Spring Review 2012
Carrick - Introduction to Physics & Electronics - Spring Review 2012Carrick - Introduction to Physics & Electronics - Spring Review 2012
Carrick - Introduction to Physics & Electronics - Spring Review 2012
 
What is Network Security?
What is Network Security?What is Network Security?
What is Network Security?
 
Trends in spies
Trends in spiesTrends in spies
Trends in spies
 
Serial Killers Presentation1
Serial Killers Presentation1Serial Killers Presentation1
Serial Killers Presentation1
 
SAN Review
SAN ReviewSAN Review
SAN Review
 
CITY OF SPIES BY SORAYYA KHAN
CITY OF SPIES BY SORAYYA KHANCITY OF SPIES BY SORAYYA KHAN
CITY OF SPIES BY SORAYYA KHAN
 

Similaire à Fortran & Link with Library & Brief Explanation of MKL BLAS

Mat lab workshop
Mat lab workshopMat lab workshop
Mat lab workshopVinay Kumar
 
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningCarol McDonald
 
SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2alhadi81
 
CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)Ortus Solutions, Corp
 
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...Ortus Solutions, Corp
 
Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Zohar Elkayam
 
C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#Hawkman Academy
 
Using existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsUsing existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsMicrosoft Tech Community
 
Learn c++ Programming Language
Learn c++ Programming LanguageLearn c++ Programming Language
Learn c++ Programming LanguageSteve Johnson
 
Meg bernal insight2014 4219
Meg bernal insight2014 4219Meg bernal insight2014 4219
Meg bernal insight2014 4219Peter Schouboe
 
Data types and Operators
Data types and OperatorsData types and Operators
Data types and Operatorsraksharao
 
Pi j1.3 operators
Pi j1.3 operatorsPi j1.3 operators
Pi j1.3 operatorsmcollison
 
Lecture 01 variables scripts and operations
Lecture 01   variables scripts and operationsLecture 01   variables scripts and operations
Lecture 01 variables scripts and operationsSmee Kaem Chann
 

Similaire à Fortran & Link with Library & Brief Explanation of MKL BLAS (20)

Mat lab workshop
Mat lab workshopMat lab workshop
Mat lab workshop
 
Matlab lec1
Matlab lec1Matlab lec1
Matlab lec1
 
embedded C.pptx
embedded C.pptxembedded C.pptx
embedded C.pptx
 
Klee and angr
Klee and angrKlee and angr
Klee and angr
 
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
 
SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2
 
CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)
 
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
 
Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?
 
C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#
 
Should i Go there
Should i Go thereShould i Go there
Should i Go there
 
MATLAB Programming
MATLAB Programming MATLAB Programming
MATLAB Programming
 
Using existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsUsing existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analytics
 
Etl2
Etl2Etl2
Etl2
 
Learn c++ Programming Language
Learn c++ Programming LanguageLearn c++ Programming Language
Learn c++ Programming Language
 
Meg bernal insight2014 4219
Meg bernal insight2014 4219Meg bernal insight2014 4219
Meg bernal insight2014 4219
 
Data types and Operators
Data types and OperatorsData types and Operators
Data types and Operators
 
Pi j1.3 operators
Pi j1.3 operatorsPi j1.3 operators
Pi j1.3 operators
 
DBCC - Dubi Lebel
DBCC - Dubi LebelDBCC - Dubi Lebel
DBCC - Dubi Lebel
 
Lecture 01 variables scripts and operations
Lecture 01   variables scripts and operationsLecture 01   variables scripts and operations
Lecture 01 variables scripts and operations
 

Dernier

Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxNadaHaitham1
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 

Dernier (20)

Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 

Fortran & Link with Library & Brief Explanation of MKL BLAS

  • 1. Some things you need to know Jongsu Kim
  • 3. Fortran…. • Still Fortran 77, 90, or 95? • Fortran 2003 & 2008 is already here and 2015 will be a future. • Some parts will be deleted or obsolescent. • We are using Fortran wrong way.
  • 4. What you shouldn’t use Labeled Do Loops do 100 ii=istart,ilast,istep isum = isum + ii 100 continue 1 2 3 4 5 6 7 A B C(1) C(2) EQUIVALENCE specify the sharing of storage units by two or more objects in a scoping unit character (len=3) :: C(2) character (len=4) :: A,B equivalence (A,C(1)), (B,C(2)) COMMON Blocks of physical storage accessed by any of the scoping units in a program COMMON /BLOCKA/ A,B,C(10,30) COMMON I, J, K ENTRY subroutine-like-things Inside subroutine FIXED FORM SOURCE Fortran 77 style (80 column restriction) CHARACTER* form replaced with CHARACTER(LEN=?) NON-BLOCK DO CONSTRUCT the DO range doesn't end in a CONTINUE or END DO
  • 5. What you shouldn’t use Labeled Do Loops Label doesn’t need, hard to remember what meaning of number. Moreover, we have END DO or CYCLE statement EQUIVALENCE Equivalence is also error-prone. It is hard to memorize all of positions where this variables points. Since COMMON and EQUIVALENCE is not to encouraged to use, BLOCK statement is also not to do. COMMON Sharing lots of variables over program is dangerous. It is error-prone ENTRY It complicates program because we have module & subroutine NON-BLOCK DO CONSTRUCT Hard to maintain where DO loop ends
  • 6. What you might want to use – CYCLE , EXIT • Avoid GOTO Statement • Use CYCLE or EXIT statement • CYCLE : Skip to the end of a loop • EXIT : exit loop do i=1, 100 x = real(i) y = sin(x) if (i == 20) exit z = cos(x) enddo do i=1, 100 x = real(i) y = sin(x) if (i == 20) cycle z = cos(x) enddo 19 iteration will be done successfully, but at 20th iteration, y = sin(x) executed then exit loop. 100 iteration, but at i=20, z = cos(x) doesn’t executed
  • 7. What you might want to use – CYCLE , EXIT • Avoid GOTO statement • Use CYCLE or EXIT statement with nested loop • Constructs (DO, IF, CASE, etc.) may have names outer: do j=1, 100 inner: do i=1, 100 x = real(i) y = sin(x) if (i > 20) exit outer z = cos(x) enddo inner enddo outer Exit whole loop at i=21 Skip z=cos(x) when i>21 outer: do j=1, 100 inner: do i=1, 100 x = real(i) y = sin(x) if (i > 20) cycle outer z = cos(x) enddo inner enddo outer
  • 8. What you might want to use – WHERE real, dimension(4) :: & x = [ -1, 0, 1, 2 ], & a = [ 5, 6, 7, 8 ] ... where (x < 0) a = -1. end where where (x /= 0) a = 1. / a elsewhere a = 0. end where where (x < 0) a = -1. end where a : {-1.0, 6.0, 7.0, 8.0} where (x /= 0) a = 1. / a elsewhere a = 0. end where a : {-1.0, 0.0, 1.0/7.0, 1.0/8.0}
  • 9. What you might want to use – ANY integer, parameter :: n = 100 real, dimension(n,n) :: a, b, c1, c2 c1 = my_matmul(a, b) ! home-grown function c2 = matmul(a, b) ! built-in function if (any(abs(c1 - c2) > 1.e-4)) then print *, ’There are significant differences’ endif • ANY and WHERE remove redundant do loop
  • 10. What you might want to use – DO CONCURRENT • Vectorization • Simple example of Auto-Parallelization • Definition : Processes one operation on multiple pairs of operands at once do concurrent (i=1:m) call dosomething() end do DO i=1,1024 C(i) = A(i) * B(i) END DO DO i=1,1024,4 C(i:i+3) = A(i:i+3) * B(i:i+3) END DO • ALLOW/REQUEST Vectorization. If you need vectorization, enable –parallel option. • No data dependencies, No EXIT or CYCLE Statement, No return statement. • Use with OpenMP.
  • 11. For More.. • Read Fortran 2008 Standard • http://www.j3-fortran.org/doc/year/10/10-007.pdf • More recent document for Fortran 2015 (or more, working now) • http://j3-fortran.org/doc/year/15/15-007.pdf • Easy to read documents • The new features of Fortran 2008 : ftp://ftp.nag.co.uk/sc22wg5/N1801-N1850/N1828.pdf • Modern Programming Languages: Fortran90/95/2003/2008 : https://www.tacc.utexas.edu/documents/13601/162125/fortran_class.pdf
  • 13. Build? • Process From Source Code to Executable Files, so called Build. • Compiler : tool for compile, Linker : tool for Link. • ifort, gcc, gfortran, and so on are combined tool for compile & link. Source Code1.f Source Code2.f Source Code3.f Source Code1.o Source Code2.o Source Code3.o Compile Link Libraries(FFTW..) Readable Unreadable a.out
  • 14. Makefile? • make do all of compile & link jobs automatically. Makefile is a build script. • make(actually gmake) is one of many tools. There are many tools like make, so called build system. • Visual studio has own build system. Hence it doesn’t use makefile. $ gcc -o hellomake hellomake.c hellofunc.c -I. hellomake: hellomake.c hellofunc.c gcc -o hellomake hellomake.c hellofunc.c -I. 1. Command-line 2. Simple Makefile (1) • “hellomake:” : rule name • “hellomake.c hellofunc.c hellomake.h” : dependencies • “gcc …” : actual command • Simply “make” execute first rule defined in Makefile Makefile Command-line $ make or $ make hellomake
  • 15. Makefile? CC=gcc CFLAGS=-I. hellomake: hellomake.o hellofunc.o $(CC) -o hellomake hellomake.o hellofunc.o -I. 3. Simple Makefile (3) Add constants • “CC=gcc” : C Compiler • “CFLAGS” : list of flags to pass to the compilation command • For Fortran, “FC” instead of “CC”, “FFLAGS” instead of “CFLAGS” • Indent(tab) with command line (“$(CC)”) is important! $ make or $ make hellomake
  • 16. Makefile? CC=gcc CFLAGS=-I. DEPS = hellomake.h hellomake: hellomake.o hellofunc.o $(CC) -o hellomake hellomake.o hellofunc.o -I. %.o: %.c $(DEPS) $(CC) -c $< $(CFLAGS) 4. Simple Makefile (4) Automatically find .c files and make a rule for compilation(.o). $@ and $< are special macros in Makefile • Rule %.o : rule for compilation, Rule hellomake : rule for link. • $@ is the name of the file to be made. (e.g. hellomake for rule hellomake) • $< The name of the first prerequisite. (hellomake.o is first prerequisite of rule hellomake) • $^ The names of all the prerequisites, with spaces between them • $* the prefix shared by target and dependent files (hellomake : $* of hellomake.c) $ make or $ make hellomake
  • 17. Compiler & Linker Options FFLAGS=-O3 -r8 -openmp -I /home/astromece/usr/fftw/include LIBS=-L/home/astromeca/usr/lib -lfftw3 -lm Compiler Options and Linker Options • -O3 : Optimization Level (O1 : Code size optimization, O2 : General Optimization(Default), O3 : Aggressive Optimization) • -r8 : real type is a double precision (8byte(=64bit) for real) • -I : Specify include directory. Include : .h files (declaration) • -L : Specify library directory. Library files : .so or .a • -lfftw3 : Link with fftw3 library • -lm : link with math library (to use several math intrinsic functions)
  • 18. Compiler & Linker Options Recommend options • -heap-arrays [numbers] : Puts automatic arrays and arrays above [numbers]KB created for temporary computations on the heap instead of the stack. Same effect as allocate statement. • -axcode [code] : Specify CPU architecture. DGIST, Boolt : AVX, CSE Server(OMP) : SSE4.1, CSE Server(SMP) : SSE4.2 • -O2 : before enable –O3, compare results with -O2 and -O3 options. “Sometimes”, -O3 cause different results. • -parallel : Enable auto parallelized code. turn on if you use DO CONCURRENT. • -free : free-form source (f90 style), ifort automatically compile .f file as Fortran77. If you want to compile .f suffix as Fortran 90 or higher, enable this option. • $ man ifort gives us a lot of additional information. Debug vs Release • -g (to use debugger) or –check (check array bounds and son on) option help reducing errors, however, it adds some additional code hence it slows code and turn off optimization automatically. • If you are sure that you don’t have errors and want to get results, enable optimization but remove –g or – check options.
  • 19. MKL BLAS & CG Method
  • 20. Intel MKL(Math Kernel Library) and BLAS Intel MKL • A library of optimized math routines for science, engineering, and financial applications. • Basic functions related to matrix or vector included. • You don’t need any installation, just add library. BLAS • Basic Linear Algebra Subprograms • a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication • It has same interface but has various implementations, ATLAS, MKL, OpenBLAS, GotoBLAS and so on. • I will use MKL BLAS because it is easy to compile and well documentated. • It already parallelized. Hence, just turn on an option make all parallelism without using OpenMP. (MPI parallelism is not implemented). I will show how to make CG method using MKL BLAS line by line.
  • 21. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 1 1 row offsets column indices values 9 entries (non zero entries)
  • 22. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 1 2 1 7 column indices values 9 entries (non zero entries) row offsets
  • 23. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 1 2 2 1 7 2 column indices values 9 entries (non zero entries) row offsets
  • 24. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 1 2 2 3 1 7 2 8 column indices values 9 entries (non zero entries) row offsets
  • 25. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 1 2 2 3 1 1 7 2 8 5 column indices values 9 entries (non zero entries) row offsets
  • 26. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 1 2 2 3 1 3 1 7 2 8 5 3 column indices values 9 entries (non zero entries) row offsets
  • 27. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 1 2 2 3 1 3 4 1 7 2 8 5 3 9 column indices values 9 entries (non zero entries) row offsets
  • 28. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 1 2 2 3 1 3 4 2 1 7 2 8 5 3 9 6 column indices values 9 entries (non zero entries) row offsets
  • 29. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 1 2 2 3 1 3 4 2 4 1 7 2 8 5 3 9 6 4 column indices values 9 entries (non zero entries) row offsets
  • 30. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 10 1 2 2 3 1 3 4 2 4 1 7 2 8 5 3 9 6 4 column indices values 9 entries (non zero entries) row offsets Indicates end
  • 31. Sparse matrix • If construct A matrix with zeros, 16 * 8bytes is required • Sparse matrix, CSR matrix, requires 23 * 8bytes. • Inefficient? No, if you have large A matrix, such as 𝑛𝑥 ⋅ 𝑛𝑦 × (𝑛𝑥 ⋅ 𝑛𝑦), CSR is SOOOO efficient. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 10 1 2 2 3 1 3 4 2 4 1 7 2 8 5 3 9 6 4
  • 32. What BLAS Library Functions Required? • mkl_dcsrgemv : Computes matrix - vector product of a sparse general matrix stored in the CSR format (3- array variation) with zero-based indexing with double precision. used in 𝐴𝑥 computation. • call mkl_dcsrgemv(transa, m, a, ia, ja, x, y) • transa : determine 𝐴𝑥 (transa=‘N’ or ‘n’) or 𝐴’𝑥 (transa=‘T’ or ‘t’ or ‘C’ or ‘c’). • m : # of rows of A • a : Values array of A in CSR format • ia : Row offset array of A in CSR format • ja : Column indices array of A in CSR format • x : x vector • y : output (𝐴𝑥) • dcopy : Copy vector (routines), copy arrays from x to y. 𝑦 = 𝑥 • call dcopy(n, x, y) • n : # of elements in vectors 𝑥 and 𝑦. • x : Input, 𝑥 vector • y : Output, 𝑦 vector
  • 33. What BLAS Library Functions Required? • ddot : Computes a vector-vector dot product. 𝑥 ⋅ 𝑦 • not subroutine, it’s a function. • dot(x, y) • x, y : 𝑥, 𝑦 vector • daxpy : Computes a vector-scalar product and adds the result to a vector. SAXPY : Single-precision A·X Plus Y • 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦 • call daxpy(n, a, x, y) • n : # of elements in vectors 𝑥 and 𝑦. • A : Scalar A • x : Input, 𝑥 vector • y : Output, 𝑦 vector • dnrm2 : Computes the Euclidean norm of a vector. 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦 • not subroutine, it’s a function • nrm2(x) • n : # of elements in vectors 𝑥. • x : Input, 𝑥 vector