SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
Modern Fortran as the Port of Entry for Scientific Parallel Computing
Bill Celmaster
Digital Equipment Corporation, 129 Parker Street, Maynard,
MA 01754, USA
Abstract
New features of Fortran are changing the way in which scientists are writ-
ing, maintaining and parallelizing large analytic codes. Among the most excit-
ing kinds of language developments are those having to do with parallelism.
This paper describes Fortran 90 and the standardized language extensions
for both shared-memory and distributed-memory parallelism. Several case-
studies are examined showing how the distributed-memory extensions (High
Performance Fortran) are used both for data parallel and MIMD (multiple
instruction multiple data) algorithms.
1 A Brief History of Fortran
Fortran (FORmula TRANdating) was the result of a project begun by John
Backus at IBM in 1954. The goal of this project was to provide a way for
programmers to express mathematical formulas through a formalism that com-
puters could translate into machine instructions. Fortran haa evolved contin-
uously over the years in response to the needs of users. Areas of evolution
have addressed mathematical expressivity, program maintainability, hardware
control (such as 1/0) and, of course, code optimizations. In the meantime,
other languages such as C and C++ have been designed to better meet the
non-mathematical aspects of software design. By the 1980’s, pronouncements
of the ‘death of Fortran’ prompted language designers to propose extensions
to Fortran which incorporated the best features of these other high-level lan-
guages and, in addition, provided new levels of mathematical expressivity
that had become popular on supercomputers such as the CYBER 205 and the
CRAY systems. This language became standardized as Fortran 90 (ISO/IEC
1539: 1991; ANSI X3.198-1992).
Although it isn’t clear at this time whether the modernization of Fortran
can, of itself, stem the C tide, I will try to demonstrate in this paper that
modern Fortran is a viable mainstream language for parallelism. Although
parallelism is not yet part of the scientific programming mainstream, it seems
likely that parallelism will become much more common now that appropriate
standards have evolved. Just as early Fortran enabled average scientists and
engineers to program computers of the 1960’s, modern Fortran may enable av-
erage scientists and engineers to program parallel computers by the beginning
of the next millenia.
2 An Introduction to Fortran 90
Fortran 901
has added some important capabilities in the area of mathematical
expressivity by introducing a wealth of natural constructs for manipulating
1
arrays. In addition, Fortran 90 has incorporated modern control constructs
and up-to-date features for data abstraction and data hiding.
The following code fragment illustrates the simplicity of dynamic memory
allocation with Fortran 90. It also illustrates some of the new syntax for
declaring data types, some examples of array manipulations, and an example
of how to use the new intrinsic matrix multiplication function.
REAL, DIMENSION(: , : , : ) ,
k ALLOCATABLE : : GRID
REAL*8 A(4,4) ,B(4,4) ,C(4,4)
READ *, N
ALLOCATE (GRID (N+2, N+2, 2) )
GRID(: , : ,1) = 1.0
GRID ( : , : ,2) = 2.0
A = GRID(1:4,1:4,1)
B = GRID(2:5,1:4,2)
C = MATMUL(A,B)
NEW DECLARATION SYNTAX
DYNAMIC STORAGE
OLD DECLARATION SYNTAX
READ IN THE DIMENSION
ALLOCATE THE STORAGE
ASSIGN PART OF ARRAY
ASSIGN REST OF ARRAY
ASSIGNMENT
ASSIGNMENT
MATRIX MULTIPLICATION
In general, many of the new features of Fortran90 help compilers to per-
form architecture-targetted optimizations. More importantly, these feat~es
help programmers express basic numerical algorithms inways (suchas using
the intrinsic function FIATHUL above), which are inherently more amenable to
optimizations that take advantage ofmuk,iple arithmetic units.
3 Abriefhistory ofparallel Fortran: P C F a n d H P F
Over the past IO years, two significant efforts have been undertaken tostan-
dardize parallel extensions to Fortran. The first of these was under the aus-
pices of the Parallel Computing Forum (PCF) and targetted global-shared-
memory architectures. The PCF design center was control parallelism, with
little attention to language features for managing data locality. The 1991
PCF standard established an approach to shared-memory extensions ofFor-
tran, and also established an interim syntax. These extensions were later
somewhat modified and incorporated in the standard extensions now known
as ANSI X3H5. In addition, compiler technologies have evolved to the point
that compilers are often able to detect shared-memory parallelization oppor-
tunities and automatically decompose codes.
This kind of parallelism is all well and good provided that data can be ac-
cessed democratically and quickly by all processors. With modern hardware,
this amounts to saying that memory latencies are lower than 100 nanoseconds,
and memory bandwidths are greater than 100 MB/s. Those kinds of parame-
ters characterize many of today’s shared-memory servers, but have not char-
acterized any massively parallel or network parallel systems. As a result, at
about the time of adoption of the ANSI X3H5 standard, another standardiza-
tion committee began work on extending Fortran 90 for distributed-memory
architectures, with the goal of providing a language suitable for scalable com-
puting. This committee became known as the High Performance Fortran
Forum, and produced in 1993 the High Performance Fortran (HPF) language
specification. The HPF design center is data parallelism and many data place-
ment directives are provided for the programmer to optimize data locality. In
addition, HPF includes ways to specify a more general style of MIMD (mul-
2
tiple instruction multiple data) execution, in which separate processors can
independently work on different parts of the code. This MIMD specification
is formalized in such a way as to make the resulting code far more main-
tainable than previous message-library ways of specifying MIMD distributed
parallelism.
Digital’s products support both the PCF and HPF extensions. The HPF
extensions are supported as part of the DEC Fortran 90 compiler, and the PCF
TM Fortr. optimizer.
extensions are supported through the Digital KAP
4 Cluster Fortran parallelism
High Performance Fortran V1.1 is the only language standard, today, for
distributed-memory parallel computing. The most significant way in which
HPF extends Fortran 90 is through a rich family of data placement directives.
There are also library routines and some extensions for control parallelism.
Without question, HPF is the simplest way of decreasing turnaround time via
parallelism, on clusters (a.k.a. farms) of workstations or servers. Fi.u-thermore,
HPF has over the past year become widely available and is supported on the
platforms of all major vendors.
HPF is often considered to be a data parallel language. That is, it facili-
tates parallelization of array-based algorithms in which the instruction stream
can be described as a sequence of array manipulations, each of which is in-
herently parallel. What is less well know is the fact that HPF also provides a
very powerful way of expressing the more general parallelism known as MuJti-
ple Instruction Multiple Data (MIMD) parallelism. This kind of parallelism is
one in which individual processors can operate simultaneously on independent
instruction streams, and generally exchange data either by explicitly sharing
memory or exchanging messages. Several case-studies follow which illustrate
both the data parallel and the MIMD style of programming.
4.1 Finite-difference algorithms
As the most mind-bogglingly simple illustration of HPF in action, consider a
simple one-dimensional grid problem in which each grid value is updated as a
linear combination of its (previous) nearest neighbors.
For each interior grid index i, the update algorithm is
y(i) = x(i-1) + x (i-1) - 2*x (i)
In Fortran 90, the resulting DO loop can be expressed as a single array as-
signment. How would this be parallelized? The simplest way to imagine
parallelization would be to partition the X and Y arrays into equally sized
chunks, with one chunk on each processor. Each iteration could proceed si-
multaneously, and at the chunk boundaries, some communication would occur
between processors. The HPF implementation of this idea is simply to take
the Fortran 90 code and add to it two data placement statements. One of
these declares that the X array should be distributed into chunks or blocks.
The other declares that the Y array should be distributed in a way which
aligns elements to the same processors as the corresponding elements of the X
array. The resultant code, for arrays with 1000 elements, is:
3
!HPF$ DISTRIBUTE X(BLOCK)
!HPF$ ALIGNY WITH X
REAL*8 X(looo), Y(looo)
< i n i t i a l i z e x >
Y(2:999) = X(1:998) + X(3:1OOO) - 2 * X(2:999)
<check the answer>
END
The HPFcompiler irresponsible forgeneratingall ofthe boundary-element
communication code, and for determining the most even distribution ofarrays.
Ofcourse, in this example, the time to communicate boundary element data
between processors is generally far greater than the time to perform floating
point operations. If latency is too large, the problem isn’t worth parallelizing.
One example of a low-latency network is Digital’s Memory ChannelTM
cluster interconnect with HPF latencies on the order of 10 microseconds. With
such low latency, the above example may be worth parallelizing for, say, 4
processors.
This one-dimension example can be generalized to various two- and three-
dimensional examples that typically arise through the solutions of partial dif-
ferential equations. The distribution directives generalize to allow various
kinds of rectangular partitioning of the arrays. Just as in the 1-D case,
the compiler takes care of boundary communications. For large problems,
the communications times are governed less by the latency, and more by
the bandwidth. Digital’s Fortran 90 compiler and runtime libraries performs
several optimizations of those communications, including so-called message-
vectorization and shadow-edge replication.
4..2 Communications and MIMD programming with HPF
Since HPF can be used to place data, it stands to reason that communication
can be forced between processors. The beauty of HPF is that all of this can
be done in the context of mathematics, rather than the context of distributed
parallel programming. The following code fragment illustrates how this is
done.
! HPF$ DISTRIBUTE(* ,BLOCK) : : U
!HPF$ ALIGN V WITH U
REAL*8 U(N, 2) ,V(N,2)
<initialize arrays>
V( : ,1) = U( : ,2) ! HOVE A VECTOR BETWEEN PROCESSORS
On two processors, the two columns of the U and V arrays are each on
different processors, thus the array assignment causes one of those columns
to be moved to the other processor. Notice that the programmer needn’t be
explicit about the parallelism. In fact, scientists and engineers rarely want
to express parallelism. If you examine typical message-passing programs, the
messages often express communication of vector and array information.
Having said this, it turns out that despite the fervent hopes of program-
4
mers, there are times when a parallel algorithm can most simply be ex-
pressed as a collection of individual instruction streams operating on locaf
data. This MIMD style of programming can be expressed in HPF with the
EXTRINSIC (HPFMICAL) declaration, as illustrated by continuing the above
code segment as follows:
CALL CFD(V) ! DO LOCAL WORK ON THE LOCAL PART OF V
<finish the raain program>
EXTRINSIC (HPF-LOCAL) SUBROUTINE CFD(VLOCAL)
REAL*8, DIMENSION(: , :) : : VLOCAL
!HPF$DISTRIBUTE *(*,BLOCK) :: VLOCAL
< d o arbitrarily c o m p l e x work with vlocal>
END
Because the subroutine CFDis declaredto be EXTRINSIC(HPF-LOCAL), the
HPF compiler executes that routine independently oneach processor (or more
generally, the execution is done once per peer process), operating on routine-
Iocal data. As for the array argument, V, which is passed to the CFD routine,
each processor operates only on its local slice of that array. In the specific
example above on 2 processors, the first one operates on the first column ofv
and the second one operates on the second column ofV.
4.3 Clu.vters ofSMP systems
During these last few years of the second millenium, we are witnessing the
emergence of systems that consist of clusters of shared-memory computers.
This exciting development is a natural evolution of the exponential increase
in performance of mid-priced ($1OOK – $1OOOK) systems.
There are two natural ways of writing parallel Fortran programs for such
systems. The easiest way is to use HPF, and to target the totaf number of
processors. So, for example, if there were two SMP systems each with 4 pro-
cessors, one would compile the HPF program for 8 processors (more generally,
for 8 peers). If the program contained, for instance, block-distribution direc-
tives, the tiected arrays would be spfit up into 8 chunks of contiguous array
sections.
The second way of writing parallel Fortran programs for clustered-SMP
systems is to use HPF to target the totaf number of SMP machines, and
then to use PCF (or more generally, shared-memory extensions) to achieve
parallelism locally on each of the SMP machines. This technique wilf generally
be more complex than the method described above, and at this time it is
unclear whether it would have any advantages.
References
1. J. Adams, W. Brainerd, J. Martin, B. Smith and J. Wagener, Fortran
90 Handbook, McGraw-Hill, Inc., NY, 1992
5

Contenu connexe

Similaire à A High Dimensional Array Assignment Method For Parallel Computing Systems

Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systemsdairsie
 
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future PlansMVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plansinside-BigData.com
 
Harnessing the Killer Micros
Harnessing the Killer MicrosHarnessing the Killer Micros
Harnessing the Killer MicrosJim Belak
 
Deep Packet Inspection with Regular Expression Matching
Deep Packet Inspection with Regular Expression MatchingDeep Packet Inspection with Regular Expression Matching
Deep Packet Inspection with Regular Expression MatchingEditor IJCATR
 
Implementing Remote Procedure Calls
Implementing Remote Procedure CallsImplementing Remote Procedure Calls
Implementing Remote Procedure CallsThanh Nguyen
 
Network standards
Network standardsNetwork standards
Network standardshspatalia
 
Fuzzy Rules for HTML Transcoding
Fuzzy Rules for HTML TranscodingFuzzy Rules for HTML Transcoding
Fuzzy Rules for HTML TranscodingVideoguy
 
Topic2-Network_FlexRayCommBus_20111013.ppt
Topic2-Network_FlexRayCommBus_20111013.pptTopic2-Network_FlexRayCommBus_20111013.ppt
Topic2-Network_FlexRayCommBus_20111013.pptjacu123
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesHeman Pathak
 
Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2Marcirio Chaves
 
Benchmarking open source deep learning frameworks
Benchmarking open source deep learning frameworksBenchmarking open source deep learning frameworks
Benchmarking open source deep learning frameworksIJECEIAES
 
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...inside-BigData.com
 
Performance improvement by
Performance improvement byPerformance improvement by
Performance improvement byIJCNCJournal
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...KRamasamy2
 
A Framework For Performance Analysis Of Co-Array Fortran
A Framework For Performance Analysis Of Co-Array FortranA Framework For Performance Analysis Of Co-Array Fortran
A Framework For Performance Analysis Of Co-Array FortranDon Dooley
 
Nt1310 Unit 5 Algorithm
Nt1310 Unit 5 AlgorithmNt1310 Unit 5 Algorithm
Nt1310 Unit 5 AlgorithmAngie Lee
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systemsinside-BigData.com
 

Similaire à A High Dimensional Array Assignment Method For Parallel Computing Systems (20)

arm-cortex-a8
arm-cortex-a8arm-cortex-a8
arm-cortex-a8
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
 
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future PlansMVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
 
Dsa00170624
Dsa00170624Dsa00170624
Dsa00170624
 
Harnessing the Killer Micros
Harnessing the Killer MicrosHarnessing the Killer Micros
Harnessing the Killer Micros
 
Deep Packet Inspection with Regular Expression Matching
Deep Packet Inspection with Regular Expression MatchingDeep Packet Inspection with Regular Expression Matching
Deep Packet Inspection with Regular Expression Matching
 
Implementing Remote Procedure Calls
Implementing Remote Procedure CallsImplementing Remote Procedure Calls
Implementing Remote Procedure Calls
 
Network standards
Network standardsNetwork standards
Network standards
 
Fuzzy Rules for HTML Transcoding
Fuzzy Rules for HTML TranscodingFuzzy Rules for HTML Transcoding
Fuzzy Rules for HTML Transcoding
 
Topic2-Network_FlexRayCommBus_20111013.ppt
Topic2-Network_FlexRayCommBus_20111013.pptTopic2-Network_FlexRayCommBus_20111013.ppt
Topic2-Network_FlexRayCommBus_20111013.ppt
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming Languages
 
Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2
 
Benchmarking open source deep learning frameworks
Benchmarking open source deep learning frameworksBenchmarking open source deep learning frameworks
Benchmarking open source deep learning frameworks
 
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
 
Performance improvement by
Performance improvement byPerformance improvement by
Performance improvement by
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
 
A Framework For Performance Analysis Of Co-Array Fortran
A Framework For Performance Analysis Of Co-Array FortranA Framework For Performance Analysis Of Co-Array Fortran
A Framework For Performance Analysis Of Co-Array Fortran
 
Nt1310 Unit 5 Algorithm
Nt1310 Unit 5 AlgorithmNt1310 Unit 5 Algorithm
Nt1310 Unit 5 Algorithm
 
Intr fortran90
Intr fortran90Intr fortran90
Intr fortran90
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systems
 

Plus de Sabrina Green

There Are Several Advantages You C
There Are Several Advantages You CThere Are Several Advantages You C
There Are Several Advantages You CSabrina Green
 
Printable Primary Writing Paper - Printable World Ho
Printable Primary Writing Paper - Printable World HoPrintable Primary Writing Paper - Printable World Ho
Printable Primary Writing Paper - Printable World HoSabrina Green
 
Leather Writing Case With Writing Paper And Envelopes
Leather Writing Case With Writing Paper And EnvelopesLeather Writing Case With Writing Paper And Envelopes
Leather Writing Case With Writing Paper And EnvelopesSabrina Green
 
Reflective Essay - Grade B - Reflective Essay Introducti
Reflective Essay - Grade B - Reflective Essay IntroductiReflective Essay - Grade B - Reflective Essay Introducti
Reflective Essay - Grade B - Reflective Essay IntroductiSabrina Green
 
Piece Of Paper With Word Anxiety. Black And White S
Piece Of Paper With Word Anxiety. Black And White SPiece Of Paper With Word Anxiety. Black And White S
Piece Of Paper With Word Anxiety. Black And White SSabrina Green
 
J 176 Media Fluency For The Digital Age Example Of A O
J 176 Media Fluency For The Digital Age Example Of A OJ 176 Media Fluency For The Digital Age Example Of A O
J 176 Media Fluency For The Digital Age Example Of A OSabrina Green
 
FROG STREET PRESS FST6541 SMART START
FROG STREET PRESS FST6541 SMART STARTFROG STREET PRESS FST6541 SMART START
FROG STREET PRESS FST6541 SMART STARTSabrina Green
 
Essay Writing Service In Australia
Essay Writing Service In AustraliaEssay Writing Service In Australia
Essay Writing Service In AustraliaSabrina Green
 
Five Paragraph Essay Graphic Organizer In 2023
Five Paragraph Essay Graphic Organizer In 2023Five Paragraph Essay Graphic Organizer In 2023
Five Paragraph Essay Graphic Organizer In 2023Sabrina Green
 
Essential Features Of A Good Term Paper Writing Service Provider
Essential Features Of A Good Term Paper Writing Service ProviderEssential Features Of A Good Term Paper Writing Service Provider
Essential Features Of A Good Term Paper Writing Service ProviderSabrina Green
 
How To Write A College Research Paper Step By Ste
How To Write A College Research Paper Step By SteHow To Write A College Research Paper Step By Ste
How To Write A College Research Paper Step By SteSabrina Green
 
Essay My Teacher My Role Model - Pg
Essay My Teacher My Role Model - PgEssay My Teacher My Role Model - Pg
Essay My Teacher My Role Model - PgSabrina Green
 
Best College Essay Writing Service - The Writing Center.
Best College Essay Writing Service - The Writing Center.Best College Essay Writing Service - The Writing Center.
Best College Essay Writing Service - The Writing Center.Sabrina Green
 
Rutgers Admission Essay Help Essay Online Writers
Rutgers Admission Essay Help Essay Online WritersRutgers Admission Essay Help Essay Online Writers
Rutgers Admission Essay Help Essay Online WritersSabrina Green
 
How To Write An Essay - English Learn Site
How To Write An Essay - English Learn SiteHow To Write An Essay - English Learn Site
How To Write An Essay - English Learn SiteSabrina Green
 
I Need Someone To Write An Essay For Me
I Need Someone To Write An Essay For MeI Need Someone To Write An Essay For Me
I Need Someone To Write An Essay For MeSabrina Green
 
How To Properly Write A Bibliography
How To Properly Write A BibliographyHow To Properly Write A Bibliography
How To Properly Write A BibliographySabrina Green
 
Example Of An Autobiography That Will Point Your Writing I
Example Of An Autobiography That Will Point Your Writing IExample Of An Autobiography That Will Point Your Writing I
Example Of An Autobiography That Will Point Your Writing ISabrina Green
 
(PDF) Steps For Writing A Term P
(PDF) Steps For Writing A Term P(PDF) Steps For Writing A Term P
(PDF) Steps For Writing A Term PSabrina Green
 

Plus de Sabrina Green (20)

There Are Several Advantages You C
There Are Several Advantages You CThere Are Several Advantages You C
There Are Several Advantages You C
 
Printable Primary Writing Paper - Printable World Ho
Printable Primary Writing Paper - Printable World HoPrintable Primary Writing Paper - Printable World Ho
Printable Primary Writing Paper - Printable World Ho
 
Leather Writing Case With Writing Paper And Envelopes
Leather Writing Case With Writing Paper And EnvelopesLeather Writing Case With Writing Paper And Envelopes
Leather Writing Case With Writing Paper And Envelopes
 
Reflective Essay - Grade B - Reflective Essay Introducti
Reflective Essay - Grade B - Reflective Essay IntroductiReflective Essay - Grade B - Reflective Essay Introducti
Reflective Essay - Grade B - Reflective Essay Introducti
 
Piece Of Paper With Word Anxiety. Black And White S
Piece Of Paper With Word Anxiety. Black And White SPiece Of Paper With Word Anxiety. Black And White S
Piece Of Paper With Word Anxiety. Black And White S
 
J 176 Media Fluency For The Digital Age Example Of A O
J 176 Media Fluency For The Digital Age Example Of A OJ 176 Media Fluency For The Digital Age Example Of A O
J 176 Media Fluency For The Digital Age Example Of A O
 
Cat Writing Paper
Cat Writing PaperCat Writing Paper
Cat Writing Paper
 
FROG STREET PRESS FST6541 SMART START
FROG STREET PRESS FST6541 SMART STARTFROG STREET PRESS FST6541 SMART START
FROG STREET PRESS FST6541 SMART START
 
Essay Writing Service In Australia
Essay Writing Service In AustraliaEssay Writing Service In Australia
Essay Writing Service In Australia
 
Five Paragraph Essay Graphic Organizer In 2023
Five Paragraph Essay Graphic Organizer In 2023Five Paragraph Essay Graphic Organizer In 2023
Five Paragraph Essay Graphic Organizer In 2023
 
Essential Features Of A Good Term Paper Writing Service Provider
Essential Features Of A Good Term Paper Writing Service ProviderEssential Features Of A Good Term Paper Writing Service Provider
Essential Features Of A Good Term Paper Writing Service Provider
 
How To Write A College Research Paper Step By Ste
How To Write A College Research Paper Step By SteHow To Write A College Research Paper Step By Ste
How To Write A College Research Paper Step By Ste
 
Essay My Teacher My Role Model - Pg
Essay My Teacher My Role Model - PgEssay My Teacher My Role Model - Pg
Essay My Teacher My Role Model - Pg
 
Best College Essay Writing Service - The Writing Center.
Best College Essay Writing Service - The Writing Center.Best College Essay Writing Service - The Writing Center.
Best College Essay Writing Service - The Writing Center.
 
Rutgers Admission Essay Help Essay Online Writers
Rutgers Admission Essay Help Essay Online WritersRutgers Admission Essay Help Essay Online Writers
Rutgers Admission Essay Help Essay Online Writers
 
How To Write An Essay - English Learn Site
How To Write An Essay - English Learn SiteHow To Write An Essay - English Learn Site
How To Write An Essay - English Learn Site
 
I Need Someone To Write An Essay For Me
I Need Someone To Write An Essay For MeI Need Someone To Write An Essay For Me
I Need Someone To Write An Essay For Me
 
How To Properly Write A Bibliography
How To Properly Write A BibliographyHow To Properly Write A Bibliography
How To Properly Write A Bibliography
 
Example Of An Autobiography That Will Point Your Writing I
Example Of An Autobiography That Will Point Your Writing IExample Of An Autobiography That Will Point Your Writing I
Example Of An Autobiography That Will Point Your Writing I
 
(PDF) Steps For Writing A Term P
(PDF) Steps For Writing A Term P(PDF) Steps For Writing A Term P
(PDF) Steps For Writing A Term P
 

Dernier

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 

Dernier (20)

Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 

A High Dimensional Array Assignment Method For Parallel Computing Systems

  • 1. Modern Fortran as the Port of Entry for Scientific Parallel Computing Bill Celmaster Digital Equipment Corporation, 129 Parker Street, Maynard, MA 01754, USA Abstract New features of Fortran are changing the way in which scientists are writ- ing, maintaining and parallelizing large analytic codes. Among the most excit- ing kinds of language developments are those having to do with parallelism. This paper describes Fortran 90 and the standardized language extensions for both shared-memory and distributed-memory parallelism. Several case- studies are examined showing how the distributed-memory extensions (High Performance Fortran) are used both for data parallel and MIMD (multiple instruction multiple data) algorithms. 1 A Brief History of Fortran Fortran (FORmula TRANdating) was the result of a project begun by John Backus at IBM in 1954. The goal of this project was to provide a way for programmers to express mathematical formulas through a formalism that com- puters could translate into machine instructions. Fortran haa evolved contin- uously over the years in response to the needs of users. Areas of evolution have addressed mathematical expressivity, program maintainability, hardware control (such as 1/0) and, of course, code optimizations. In the meantime, other languages such as C and C++ have been designed to better meet the non-mathematical aspects of software design. By the 1980’s, pronouncements of the ‘death of Fortran’ prompted language designers to propose extensions to Fortran which incorporated the best features of these other high-level lan- guages and, in addition, provided new levels of mathematical expressivity that had become popular on supercomputers such as the CYBER 205 and the CRAY systems. This language became standardized as Fortran 90 (ISO/IEC 1539: 1991; ANSI X3.198-1992). Although it isn’t clear at this time whether the modernization of Fortran can, of itself, stem the C tide, I will try to demonstrate in this paper that modern Fortran is a viable mainstream language for parallelism. Although parallelism is not yet part of the scientific programming mainstream, it seems likely that parallelism will become much more common now that appropriate standards have evolved. Just as early Fortran enabled average scientists and engineers to program computers of the 1960’s, modern Fortran may enable av- erage scientists and engineers to program parallel computers by the beginning of the next millenia. 2 An Introduction to Fortran 90 Fortran 901 has added some important capabilities in the area of mathematical expressivity by introducing a wealth of natural constructs for manipulating 1
  • 2. arrays. In addition, Fortran 90 has incorporated modern control constructs and up-to-date features for data abstraction and data hiding. The following code fragment illustrates the simplicity of dynamic memory allocation with Fortran 90. It also illustrates some of the new syntax for declaring data types, some examples of array manipulations, and an example of how to use the new intrinsic matrix multiplication function. REAL, DIMENSION(: , : , : ) , k ALLOCATABLE : : GRID REAL*8 A(4,4) ,B(4,4) ,C(4,4) READ *, N ALLOCATE (GRID (N+2, N+2, 2) ) GRID(: , : ,1) = 1.0 GRID ( : , : ,2) = 2.0 A = GRID(1:4,1:4,1) B = GRID(2:5,1:4,2) C = MATMUL(A,B) NEW DECLARATION SYNTAX DYNAMIC STORAGE OLD DECLARATION SYNTAX READ IN THE DIMENSION ALLOCATE THE STORAGE ASSIGN PART OF ARRAY ASSIGN REST OF ARRAY ASSIGNMENT ASSIGNMENT MATRIX MULTIPLICATION In general, many of the new features of Fortran90 help compilers to per- form architecture-targetted optimizations. More importantly, these feat~es help programmers express basic numerical algorithms inways (suchas using the intrinsic function FIATHUL above), which are inherently more amenable to optimizations that take advantage ofmuk,iple arithmetic units. 3 Abriefhistory ofparallel Fortran: P C F a n d H P F Over the past IO years, two significant efforts have been undertaken tostan- dardize parallel extensions to Fortran. The first of these was under the aus- pices of the Parallel Computing Forum (PCF) and targetted global-shared- memory architectures. The PCF design center was control parallelism, with little attention to language features for managing data locality. The 1991 PCF standard established an approach to shared-memory extensions ofFor- tran, and also established an interim syntax. These extensions were later somewhat modified and incorporated in the standard extensions now known as ANSI X3H5. In addition, compiler technologies have evolved to the point that compilers are often able to detect shared-memory parallelization oppor- tunities and automatically decompose codes. This kind of parallelism is all well and good provided that data can be ac- cessed democratically and quickly by all processors. With modern hardware, this amounts to saying that memory latencies are lower than 100 nanoseconds, and memory bandwidths are greater than 100 MB/s. Those kinds of parame- ters characterize many of today’s shared-memory servers, but have not char- acterized any massively parallel or network parallel systems. As a result, at about the time of adoption of the ANSI X3H5 standard, another standardiza- tion committee began work on extending Fortran 90 for distributed-memory architectures, with the goal of providing a language suitable for scalable com- puting. This committee became known as the High Performance Fortran Forum, and produced in 1993 the High Performance Fortran (HPF) language specification. The HPF design center is data parallelism and many data place- ment directives are provided for the programmer to optimize data locality. In addition, HPF includes ways to specify a more general style of MIMD (mul- 2
  • 3. tiple instruction multiple data) execution, in which separate processors can independently work on different parts of the code. This MIMD specification is formalized in such a way as to make the resulting code far more main- tainable than previous message-library ways of specifying MIMD distributed parallelism. Digital’s products support both the PCF and HPF extensions. The HPF extensions are supported as part of the DEC Fortran 90 compiler, and the PCF TM Fortr. optimizer. extensions are supported through the Digital KAP 4 Cluster Fortran parallelism High Performance Fortran V1.1 is the only language standard, today, for distributed-memory parallel computing. The most significant way in which HPF extends Fortran 90 is through a rich family of data placement directives. There are also library routines and some extensions for control parallelism. Without question, HPF is the simplest way of decreasing turnaround time via parallelism, on clusters (a.k.a. farms) of workstations or servers. Fi.u-thermore, HPF has over the past year become widely available and is supported on the platforms of all major vendors. HPF is often considered to be a data parallel language. That is, it facili- tates parallelization of array-based algorithms in which the instruction stream can be described as a sequence of array manipulations, each of which is in- herently parallel. What is less well know is the fact that HPF also provides a very powerful way of expressing the more general parallelism known as MuJti- ple Instruction Multiple Data (MIMD) parallelism. This kind of parallelism is one in which individual processors can operate simultaneously on independent instruction streams, and generally exchange data either by explicitly sharing memory or exchanging messages. Several case-studies follow which illustrate both the data parallel and the MIMD style of programming. 4.1 Finite-difference algorithms As the most mind-bogglingly simple illustration of HPF in action, consider a simple one-dimensional grid problem in which each grid value is updated as a linear combination of its (previous) nearest neighbors. For each interior grid index i, the update algorithm is y(i) = x(i-1) + x (i-1) - 2*x (i) In Fortran 90, the resulting DO loop can be expressed as a single array as- signment. How would this be parallelized? The simplest way to imagine parallelization would be to partition the X and Y arrays into equally sized chunks, with one chunk on each processor. Each iteration could proceed si- multaneously, and at the chunk boundaries, some communication would occur between processors. The HPF implementation of this idea is simply to take the Fortran 90 code and add to it two data placement statements. One of these declares that the X array should be distributed into chunks or blocks. The other declares that the Y array should be distributed in a way which aligns elements to the same processors as the corresponding elements of the X array. The resultant code, for arrays with 1000 elements, is: 3
  • 4. !HPF$ DISTRIBUTE X(BLOCK) !HPF$ ALIGNY WITH X REAL*8 X(looo), Y(looo) < i n i t i a l i z e x > Y(2:999) = X(1:998) + X(3:1OOO) - 2 * X(2:999) <check the answer> END The HPFcompiler irresponsible forgeneratingall ofthe boundary-element communication code, and for determining the most even distribution ofarrays. Ofcourse, in this example, the time to communicate boundary element data between processors is generally far greater than the time to perform floating point operations. If latency is too large, the problem isn’t worth parallelizing. One example of a low-latency network is Digital’s Memory ChannelTM cluster interconnect with HPF latencies on the order of 10 microseconds. With such low latency, the above example may be worth parallelizing for, say, 4 processors. This one-dimension example can be generalized to various two- and three- dimensional examples that typically arise through the solutions of partial dif- ferential equations. The distribution directives generalize to allow various kinds of rectangular partitioning of the arrays. Just as in the 1-D case, the compiler takes care of boundary communications. For large problems, the communications times are governed less by the latency, and more by the bandwidth. Digital’s Fortran 90 compiler and runtime libraries performs several optimizations of those communications, including so-called message- vectorization and shadow-edge replication. 4..2 Communications and MIMD programming with HPF Since HPF can be used to place data, it stands to reason that communication can be forced between processors. The beauty of HPF is that all of this can be done in the context of mathematics, rather than the context of distributed parallel programming. The following code fragment illustrates how this is done. ! HPF$ DISTRIBUTE(* ,BLOCK) : : U !HPF$ ALIGN V WITH U REAL*8 U(N, 2) ,V(N,2) <initialize arrays> V( : ,1) = U( : ,2) ! HOVE A VECTOR BETWEEN PROCESSORS On two processors, the two columns of the U and V arrays are each on different processors, thus the array assignment causes one of those columns to be moved to the other processor. Notice that the programmer needn’t be explicit about the parallelism. In fact, scientists and engineers rarely want to express parallelism. If you examine typical message-passing programs, the messages often express communication of vector and array information. Having said this, it turns out that despite the fervent hopes of program- 4
  • 5. mers, there are times when a parallel algorithm can most simply be ex- pressed as a collection of individual instruction streams operating on locaf data. This MIMD style of programming can be expressed in HPF with the EXTRINSIC (HPFMICAL) declaration, as illustrated by continuing the above code segment as follows: CALL CFD(V) ! DO LOCAL WORK ON THE LOCAL PART OF V <finish the raain program> EXTRINSIC (HPF-LOCAL) SUBROUTINE CFD(VLOCAL) REAL*8, DIMENSION(: , :) : : VLOCAL !HPF$DISTRIBUTE *(*,BLOCK) :: VLOCAL < d o arbitrarily c o m p l e x work with vlocal> END Because the subroutine CFDis declaredto be EXTRINSIC(HPF-LOCAL), the HPF compiler executes that routine independently oneach processor (or more generally, the execution is done once per peer process), operating on routine- Iocal data. As for the array argument, V, which is passed to the CFD routine, each processor operates only on its local slice of that array. In the specific example above on 2 processors, the first one operates on the first column ofv and the second one operates on the second column ofV. 4.3 Clu.vters ofSMP systems During these last few years of the second millenium, we are witnessing the emergence of systems that consist of clusters of shared-memory computers. This exciting development is a natural evolution of the exponential increase in performance of mid-priced ($1OOK – $1OOOK) systems. There are two natural ways of writing parallel Fortran programs for such systems. The easiest way is to use HPF, and to target the totaf number of processors. So, for example, if there were two SMP systems each with 4 pro- cessors, one would compile the HPF program for 8 processors (more generally, for 8 peers). If the program contained, for instance, block-distribution direc- tives, the tiected arrays would be spfit up into 8 chunks of contiguous array sections. The second way of writing parallel Fortran programs for clustered-SMP systems is to use HPF to target the totaf number of SMP machines, and then to use PCF (or more generally, shared-memory extensions) to achieve parallelism locally on each of the SMP machines. This technique wilf generally be more complex than the method described above, and at this time it is unclear whether it would have any advantages. References 1. J. Adams, W. Brainerd, J. Martin, B. Smith and J. Wagener, Fortran 90 Handbook, McGraw-Hill, Inc., NY, 1992 5