Code_Aster
®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU
Key
:
D9.03.01-A
Page
:
1/14
Data-processing manual of Description
D9.03 booklet: -
HT-66/05/003/A
Organization (S):
EDF-R & D/SINETICS
Data-processing manual of Description
D9.03 booklet: -
Document: D9.03.01
Implementation of algorithm FETI
Summary:
One describes here implementation the data-processing of the algorithm of resolution of linear systems FETI. One
takes the notations of the notes of Reference [R6.01.03] [bib1] and one again support also on those of Use
[U4.50.01] and of Development [D4.06.05], [D4.06.07], [D4.06.10], [D4.06.11] and [D4.06.21]. One will find in
this document the simplified flow chart of the process of resolution, in sequential mode as in parallel,
allowing to distinguish its main logical hinges, its shaft of call, the main variables
accused like their contents. Specificities and the philosophy of parallelism set up are
particularly detailed.
One has on the other hand chooses not to weigh down the talk by mentioning the calls to the routines supervisor, with those
of management of objects JEVEUX, handling of low level (VTDEFS, VTCREB…) and details of the routines
preliminaries with the unfolding of algorithm FETI (ASSMAM, MEACMV, MERESO…).
Code_Aster
®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU
Key
:
D9.03.01-A
Page
:
2/14
Data-processing manual of Description
D9.03 booklet: -
HT-66/05/003/A
1 Flow chart
simplified
Appear 1-a: Flow chart simplified before ALFETI
Preparation of the data solvor: SD
Main SOLVEUR and slaves.
Each proc.
J
calculate them
data relating to sound
perimeter of under-fields
:
(
K
J
K
)
+
and
B
J
K
.
With the processor
J
are associated them
under-fields
J
1
,
J
2
…
J
K.
The SD
Main SOLVEUR is built by
each proc. and its .FETS
point that on SD SOLVEUR of
slaves under-fields
J
K
:
.FETS SD SOLVEUR
J
K
Idem on
MATR_ASSE/CHAM_NO:
.FETM SD MATR_ASSE
J
K
.FETC SD CHAM_NO
J
K
All the processors carried out
same operations until it
level there and thus have access to
entirety of data JEVEUX
known (mesh, fields resulting
pretreatments…).
CRESOL
Reading of the data of calculation
(mesh, materials…),
pretreatments… Input in
possible loops of calculation:
increment of load, not of
Newton…
Before
order
calling FETI
NUMBER
Renumérotation and factorization
symbolic system, constitution of the SD
Main NUME_DDL and slaves.
ASSMAM
ASSVEC
Assemblies of the matrices of rigidity and
vectors second local members.
Filling of the SD
MATR_ASSE/CHAM_NO
Masters and
slaves.
PRERES
RESOUD/
ALFETI
Resolution via
the algorithm
FETI itself, to see
[Figure 1-b].
Idem on the NUME_DDL:
.FETN SD NUME_DDL
J
K
Factorization of matrices of rigidity
local (
K
I
)
+
and seeks of theirs
rigid modes
B
I
.
Main routines
Total function
Specificity of parallelism
Code_Aster
®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU
Key
:
D9.03.01-A
Page
:
3/14
Data-processing manual of Description
D9.03 booklet: -
HT-66/05/003/A
Appear 1-b: Flow chart simplified in ALFETI (level 1)
Construction of
()
(
)
0
0
:
R
F
K
R
R
T
I
I
I
I
-
=
+
.
Known of all the proc.
Known Objets JEVEUX that
proc. 0
E
0
known proc 0 and
0
of
all.
FETGGT
Construction of
[
]
Q
Q
I
B
R
B
R
G
K
1
1
:
=
and of
G
I
T
G
I
.
FETINL
Calculation of
[
]
T
Q
T
Q
T
B
F
B
F
E
K
1
1
0
:
=
and
of
(
)
[
]
E
G
G
G
1
0
:
-
=
I
T
I
I
.
FETRIN
Known that proc. 0.
Calculation of
()
[
]
()
(
)
0
1
0
0
:
R
G
G
G
G
I
Pr
G
T
I
I
T
I
I
-
-
=
=
.
FETPRJ
FETSCA/
FETPRC
Calculation of
0
1
0
:
~
Ag
AM
H
-
=
.
Known that proc. 0.
FETPRJ
Calculation of
0
0
~
:
H
P
H
=
and
p
0
=
H
0
.
Known of all proc.
Test of stop
0
0
G
?
yes
Calculation of the local solution with
each under-field
ground
I
U
and
rebuilding of the total one
ground
U
.
ground
I
U
known proc.
associated,
ground
U
proc. 0.
not
Loop
GCPPC
to see [Figure 1-c].
Code_Aster
®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU
Key
:
D9.03.01-A
Page
:
4/14
Data-processing manual of Description
D9.03 booklet: -
HT-66/05/003/A
Appear 1-c: Flow chart simplified in ALFETI (level 2)
ground
I
U
known proc.
associated,
ground
U
proc. 0.
Known that proc. 0.
Known of all.
Calculation of the local solution with
each under-field
ground
I
U
and
rebuilding of the total one
ground
U
.
Construction of
()
K
T
I
I
I
K
p
R
K
R
Z
+
=
:
.
FETFIV
Known that proc. 0.
Calculation of the parameter of descent
K
.
Known that proc. 0.
FETPRJ
DAXPY
Updates:
K
K
K
K
K
K
K
K
Pz
G
G
p
-
=
+
=
+
+
1
1
.
k+1
known that proc.
0,
G
k+1
of all.
Test of stop?
0
1
G
G
RESI_RELA
<
+
K
yes
not
Iteration
K
DDOT
FETSCA/
FETPRC/
FETPRJ
Calculation of
1
1
1
:
+
-
+
=
K
K
Ag
WFP
H
.
FETREO
Update of the direction of descent
p
k+1
(reorthogonalized or not).
Code_Aster
®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU
Key
:
D9.03.01-A
Page
:
5/14
Data-processing manual of Description
D9.03 booklet: -
HT-66/05/003/A
2
Detailed flow chart and shaft of call
Routine
appealing
Routines
called
level 1
Routines
called
level 2
Details
Information
relative to
processors
Before
order
calling FETI
(MECA_
STATICS…)
Reading of the data of calculation (mesh,
materials,
SD_FETI [D4.06.21]…),
pretreatments…
Input in possible loops of calculation:
increment of load, not of Newton…
All proc. have
carried out them
same
operations
up to this level
and thus access has
with the entirety of
data
Known JEVEUX
(mesh,
fields resulting from
pretreatments…)
.
CRESOL
· Reading of the parameters inherent in
key word factor SOLVEUR [U4.50.01],
· Creation and initialization of the objects
related “&FETI.INFO…” (for
monitoring [D4.06.21 §4]) and
SDFETI (1:8)//“.MAILLE.NUMSD”
(for the routines of assembly
[D4.06.21 §4]),
· Creation of the pointer
SOLVEUR_maître.FETS towards
SOLVEUR slaves.
O */O
P/P
P/P
SD SOLVEUR
Master is
built by
each proc. and
its .FETS
will point that on
SD SOLVEUR
under
fields
slaves
J
K
Note:
* The code O/NR means that the operation is carried out, or that the variable is known, only proc. Master (O
for Yes) but not of the others proc. (NR for Not). One uses also P (Partially) to notify that the carried out one
relate to only the under-fields of the perimeter of the proc. running.
FETMPI
· Distribution of the under-fields by proc.
and determination of the number of proc.
MPI_COMM_SIZE
MPI_COMM_RANK
Loop on
under
fields
concerned **
by the proc.
running
End of loop
CRESO1
· Constitution of SD SOLVEUR [D4.06.11]
under-fields (“slaves”),
· Checks meshs of the model/meshs
under-fields.
P/P
Into sequential
CRESO1
· Constitution of SD SOLVEUR of
total field (“main”),
· Checks meshs of the model/meshs
under-fields.
O/O
Into sequential
Note:
** Into sequential, all the under-fields are concerned with the processor running which is also the processor
Master or proc. 0. In parallel, the proc. running
J
sees itself allotted a whole of under-fields contiguous:
J
1
,
J
2
…
J
K
. In the loops on the under-fields, this information is formulated via object JEVEUX
“&FETI.LISTE.SD.MPI” ([D4.06.21 §4]) which filters the indices of loop.
NUMBER
· Constitution of the list of loads
total with all the model.
O/O
Code_Aster
®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU
Key
:
D9.03.01-A
Page
:
6/14
Data-processing manual of Description
D9.03 booklet: -
HT-66/05/003/A
Routine
appealing
Routines
called
level 1
Routines
called
level 2
Details
Information
relative to
processors
NUMER2
· Constitution of the NUME_DDL [D4.06.07]
Master and of his pointer .FETN.
O/O
The .FETN
will point that on
SD of under
fields
slaves
J
K
NUMER2 NUEFFE
· Creation of the main NUME_EQUA.
O/O
NUMER2 PROFMA
· One created not the SD main STORAGE.
O/O
· Checking of the coherence of
SD_FETI with respect to the model and of
loads (controlled by the key word
VERIF_SDFETI).
O/O
FETMPI
· Determination of the number of proc. and of
row of the proc. running.
MPI_COMM_SIZE
MPI_COMM_RANK
Loop on
under
fields
concerned with
the proc.
running
EXLIM1
· Creation of the LIGREL of the meshs
physiques of the under-field.
P/P
EXLIM2
· Constitution of the list of the LIGREL of
charge (with late meshs) impacting it
under-field,
· Their possible projections on
several under-fields. Filling
ad hoc of SD .FEL
I
associated these
projections.
P/P
P/P
NUMER2
· Constitution of the NUME_DDL slave.
P/P
NUMER2 NUEFFE
· Creation of the NUME_EQUA slave.
P/P
End of loop
NUMER2 PROFMA
· One created the SD slave STORAGE.
P/P
· Test of the identity of the PROF_CHNO.NUEQ
([D4.06.07 §5.3])
O/NR
ASSMAM/
ASSVEC
· Constitution of the MATR_ASSE/CHAM_NO
[D4.06.10], [D4.06.05] main and of theirs
pointers .FETM/.FETC. One does not build
not the useless MATR_ASSE.VALE.
O/O
Theirs
.FETM/.FETC
will point that
on the SD of
under-fields
slaves
J
K
Loop on
under
fields
concerned with
the proc.
running
End of loop
Constitution of the MATR_ASSE/CHAM_NO
slaves while being pressed on the objects
auxiliaries:
· SDFETI (1:8)//“.MAILLE.NUMSD”
who determines in the loops on
total data if the mesh considered
interest the under-field,
· .FEL
I
to be able to make the joint
between the classification of meshs or of
late nodes and their local classification
with the under-field (that of the NUME_DDL
slave).
P/P
PRERES
· Update of field “DOCU” of
MATR_ASSE maître.REFA to cross
RESOUD and the recopy of the second member
in vector solution.
O/O
Loop on
FETFAC TLDLG2
· Filling of the MATR ADZE
P/P
Code_Aster
®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU
Key
:
D9.03.01-A
Page
:
7/14
Data-processing manual of Description
D9.03 booklet: -
HT-66/05/003/A
Routine
appealing
Routines
called
level 1
Routines
called
level 2
Details
Information
relative to
processors
under
fields
concerned with
the proc.
running
esclaves.VALF of the under-fields
associated the proc. running:
()
+
I
K
,
· Calculation of the rigid modes and filling
temporary objects
“&&FETFAC.FETI.MOCR” (modes
rigid) and “&&FETFAC.FETI.INPN”
(index of the pivots quasi-null),
· Checks of the rigid modes and one
conditions of Moore-Penrose (if
INFO_FETI (6:6) = ' activated You
[U4.50.01 §3.5])
P/P
P/P
End of loop
FETFAC
· Filling of the objects
MATR_ASSE.FETF, .FETP and .FETR
([D4.06.10 §3]):
B
I
.
P/P
RESOUD
Loop on
under
fields
concerned
by the proc.
running
End of
loop
· Check that the PROF_CHNO of
MATR_ASSE is identical to that of
second member (for the Master and the SD
slaves),
· Check that the total MATR_ASSE and its
MATR_ASSE slaves indeed were
factorized.
P/P
P/P
RESFET
Loop on
under
fields
concerned
by the proc.
running
End of
loop
· Update of the temporary objects &INT
local MATR_ASSE.
P/P
ALFETI
· Algorithm FETI itself, to see
following tables [Table 2-2] and
[Table 2-3].
Table 2-1: Detailed flow chart and shaft of call before ALFETI
Routine
calling
E
Routines
called
level 1
Routines
called level
2
Details
Information
relative to
processors
ALFETI
FETMPI
· Determination of the number of proc. and of
row of the proc. running.
MPI_COMM_SIZE
MPI_COMM_RANK
Loop on
under
fields
concerned
by the proc.
running
End of
loop
· Initialization of the collections of vectors
dimensioned with the number of DDLs
(physics and late) of each under
fields: “&&FETI.COLLECTIONR” and
“&&FETI.COLLECTIONL”. They
will be used for the matric operations of
size of the local problems (
second is limited to the product stamps
vector of prepacking).
P/P
FETING
Loop on
under-fields
concerned with
proc. running
End of loop
· Constitution of the collection of vectors
dimensioned with the number of Lagranges
of interface of the under-fields
“&&FETI.COLLECTIONI”. It is used for
to make the joint enters classification
of Lagrange of interface in the list
nodes of under-fields
(SDFETI.FETB) and that of TEACHER CHNO
P/P
Code_Aster
®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU
Key
:
D9.03.01-A
Page
:
8/14
Data-processing manual of Description
D9.03 booklet: -
HT-66/05/003/A
Routine
calling
E
Routines
called
level 1
Routines
called level
2
Details
Information
relative to
processors
room resulting from factorization symbolic system.
Loop on
under-fields
concerned with
proc. running
End of loop
· Initialization of temporary objects related to
reorthogonalisation (REORTH,
NBREOR…), of temporary vectors
(K24IRR, K24LAI…),
· Creation of temporary objects for
to save calls later
JEVEUO: K24REX, K24FIV….
O/O
P/P
FETMPI
· Reduction then total dissemination of
object MATR_ASSE maître.FETF for
that all proc. know how much
rigid modes has under
field given.
In parallel (tested
by the presence
from at least 2
processors)
O/O
MPI_ALLREDUCE+
MPI_SUM
FETMPI
· Idem for the object of monitoring
“&FETI.INFO.STOCKAGE.FVAL”,
· As long as to synchronize one makes the same one
thing, without total dissemination, for
other objects of monitoring
“&FETI.INFO…”.
In parallel
O/O
MPI_ALLREDUCE+
MPI_SUM
O/NR
MPI_REDUCE+
MPI_SUM
FETGGT
Loop on
under-fields
concerned with
proc. running
End of loop
FETREX
· Construction of the rectangular matrix
[
]
Q
Q
I
B
R
B
R
G
K
1
1
:
=
(NOMGI='&&FETI.GI.R').
P/P
FETGGT FETMPI
· Construction of
G
I
complete by collection
selective towards the proc. 0.
ATTENTION, it is here that intervene
stresses: STOCKAGE_GI=' OUI'
obligatory in parallel and distribution of
under-fields in a contiguous way by
proc.
In parallel
O/NR
MPI_GATHERV
FETGGT
BLAS DDOT
· Construction of the square matrix
G
IT
G
I
(NOMGGT='&&FETI.GITGI.R').
If proc 0
O/NR
FETMON
· Monitoring if INFOFE (9:9) = ' You: sizes
under-fields, profiling of theirs
time CPU of assembly, of
factorization…
If proc 0
O/NR
FETINL
Loop on
under-fields
concerned with
proc. running
End of loop
· Construction of the vector
[
]
T
Q
T
Q
T
B
F
B
F
E
K
1
1
0
:
=
(K24ER= `&&FETINL.E.R').
P/P
FETINL FETMPI
· Construction of
E
0
complete by reduction
towards the proc. 0.
In parallel
O/NR
MPI_REDUCE+
MPI_SUM
FETINL
FETREX and
BLAS DAXPY
(if
STOCKAGE_GI
= ' OUI'),
LAPACK
DSPTRF/S
· Calculation of the Lagrange vector of interface
initial
(
)
[
]
E
G
G
G
1
0
:
-
=
I
T
I
I
(VLAGI/K24LAI/ZR (IVLAGI)).
If proc 0
O/NR
Code_Aster
®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU
Key
:
D9.03.01-A
Page
:
9/14
Data-processing manual of Description
D9.03 booklet: -
HT-66/05/003/A
Routine
calling
E
Routines
called
level 1
Routines
called level
2
Details
Information
relative to
processors
FETINL
· Distribution of
0
with all proc.
In parallel
O/O
MPI_BCAST
FETRIN
OPTION=1 Loops on
under-fields
concerned with
proc. running,
BLAS DAXPY,
FETREX,
RLTFR8,
End of loop
· Calculation of the initial residue
()
(
)
0
0
:
R
F
K
R
R
T
I
I
I
I
-
=
+
(K24IRR= `&&FETI.RESIDU.R'/
ZR (IRR))
P/P
FETRIN
OPTION=1
FETMPI
· Construction of
R
0
complete by reduction
towards the proc. 0.
In parallel
O/NR
MPI_REDUCE+
MPI_SUM
FETPRJ
OPTION=1
BLAS
DGEMV/DCOPY,
LAPACK
DSPTRS,
FETREX and
BLAS DAXPY
(if
STOCKAGE_GI
= ' OUI').
· Calculation of the initial projected residue
()
[
]
()
(
)
0
1
0
0
:
R
G
G
G
G
I
Pr
G
T
I
I
T
I
I
-
-
=
=
(K24IRG= `&&FETI.REPROJ.G'/
ZR (IRG)).
If proc 0
O/NR
FETPRJ
FETMPI
· Distribution of
G
0
with all proc.
In parallel
O/O
MPI_BCAST
FETSCA
· Scaling of the initial projected residue
0
0
~
Ag
G
=
(K24IR1/ZR (IR1)).
O/O
FETPRC
Loop on
under-fields
concerned with
proc. running,
BLAS DAXPY,
FETREX,
MRMULT,
End of loop
· Calculation of packaged projected residue
initial
0
1
0
~
:
G
M
G
-
=
(K24IR2= `&&FETI.VECNBI.AUX2'/
ZR (IR2)).
P/P
FETPRC FETMPI
· Construction of
0
G
complete by
reduction towards the proc. 0.
In parallel
O/NR
MPI_REDUCE+
MPI_SUM
FETSCA
· Scaling of the projected residue
packaged initial
0
0
~
G
With
H
=
(K24IR3='&&FETI.VECNBI.AUX3'/
ZR (IR3)).
If proc. 0
O/NR
FETPRJ
OPTION=1
BLAS
DGEMV/DCOPY,
LAPACK
DSPTRS,
FETREX and
BLAS DAXPY
(if
STOCKAGE_GI
= ' OUI').
· Calculation of
0
0
~
:
H
P
H
=
(K24IRH= `&&FETI.REPCPJ.H'/
ZR (IRH)).
If proc. 0
O/NR
FETPRJ FETMPI
· Distribution of
H
0
with all proc.
In parallel
O/O
MPI_BCAST
Code_Aster
®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU
Key
:
D9.03.01-A
Page
:
10/14
Data-processing manual of Description
D9.03 booklet: -
HT-66/05/003/A
Routine
calling
E
Routines
called
level 1
Routines
called level
2
Details
Information
relative to
processors
BLAS
DCOPY
· The variable
p
0
receives
H
0
(K24IRP='&&FETI.DD.P'/ZR (IRP))
O/O
BLAS
DDOT
BLAS
DNRM2
· Calculation of the numerator of the parameter of
descent
0
0
0
.
:
p
G
=
NR
(ALPHAN),
· Calculation of the initial standard of the residue
projected
0
G
(ANORM) and of the criterion of stop
0
:
G
RESI_RELA
=
K
(EPSIK).
If proc 0
O/NR
If proc 0
O/NR
FETMPI
· Distribution of ANORM to all procs and
calculation of EPSIK.
In parallel
O/O
MPI_BCAST
· Preparation of object JEVEUX CRITER.
O/O
FETRIN
OPTION=2
FETPRJ
OPTION=2
FETPRJ
FETMPI
Loop on
under-fields
concerned with
proc. running,
BLAS DAXPY,
FETREX,
RLTFR8,
Subloops on
LIGRELs
physics and
late of
Local CHAM_NO
End of the loops
FETMPI
Test of stop if residue
0
G
quasi-no one
(i.e. lower than
R8MIEM () ** (2.D0/3.D0)) :
· Calculation of
[
]
0
1
:
R
G
G
G
T
I
I
T
I
ground
-
=
(K24ALP='&&FETI.ALPHA.MCR'),
· Distribution of
0
with all procs
(variables K24ALP/ZR (IALPHA)).
· Calculation of the local solution
()
(
)
I
ground
T
I
I
I
ground
I
B
R
F
K
U
-
-
=
+
0
:
(K24IRR/ZR (IRR)).
· Rebuilding of the CHAM_NO
ground
I
U
solution slave specific to the under-field
(CHAMLS/ZR (IVALCS)),
· Rebuilding of the CHAM_NO
U
ground
solution
Master associated with the proc. For the nodes
physics, one adds their contribution
divided beforehand by the multiplicity
geometrical of the aforesaid node
(K24VAL/ZR (IVALS)).
· Construction of
U
ground
complete by
reduction towards the proc. 0.
If proc. 0
O/NR
In parallel
O/O
MPI_BCAST
P/P
P/P
P/P
In parallel
O/NR
MPI_REDUCE+
MPI_SUM
FETARP FETPRJ
FETFIV
ARPACK
DNAUPD/DNEUPD
BLAS DCOPY
· Test of the definite-positivity of the operator
of interface
PF
I
P
if
INFO_FETI (7:7) = ' You
([U4.50.01 §3.5])
Into sequential
· Allowance of the large objects related to
reorthogonalisation:
K24PSR='&&FETI.PS.REORTHO.R',
K24DDR='&&FETI.DD.REORTHO.R',
K24FIR='&&FETI.FIDD.REORTHO.R'.
If proc 0
O/NR
Loop
on
iterations
GCPPC
Algorithm FETI level 2 to see table
[Table 2-3].
Table 2-2: Detailed flow chart and shaft of call in ALFETI (level 1)
Code_Aster
®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU
Key
:
D9.03.01-A
Page
:
11/14
Data-processing manual of Description
D9.03 booklet: -
HT-66/05/003/A
Routine
appealing
Routines
called
level 1
Routines
called
level 2
Details
Information
relative to
processors
ALFETI
BLAS
DCOPY
· If reorthogonalisation, storage of
direction of descent
p
K
in K24DDR.
If proc 0
O/NR
Loop on
iterations of
GCPPC
FETFIV
Loop on
under-fields
concerned with
the proc. running,
BLAS DAXPY,
FETREX,
RLTFR8,
End of loop
· Calculation of the result of operator FETI
of interface on the direction of descent
()
K
T
I
I
I
K
p
R
K
R
Z
+
=
:
(K24IRZ= `&&FETI.FIDD.Z'/ZR (IRZ))
P/P
FETFIV FETMPI
· Construction of
Z
K
complete by reduction
towards the proc. 0.
In parallel
O/NR
MPI_REDUCE+
MPI_SUM
BLAS
DCOPY
· If reorthogonalisation, storage of
Z
K
in K24FIR.
If proc 0
O/NR
BLAS
DDOT
· Calculation of the denominator of the parameter of
descent current
K
K
K
D
p
Z.
:
=
(ALPHAD),
· Calculation of the parameter of descent
running
K
D
K
NR
K
=
:
(ALPHA).
If proc 0
O/NR
Idem
· If reorthogonalisation, storage of
ALPHAD in K24PSR.
If proc 0
O/NR
FETTOR FETPRJ
BLAS DDOT,
DCOPY
· Test of orthogonalities of the GCPPC if
INFO_FETI (8:8) = ' You
([U4.50.01 §3.5]).
If proc 0
O/NR
BLAS
DAXPY
FETPRJ
OPTION=1
DAXPY
· Update of the Lagrange vector
of interface current
K
K
K
K
p
+
=
+
1
(K24LAI/ZR (IVLAGI)),
· Calculation of projected intermediate
K
K
Pz
R
=
1
(K24IR1='&&FETI.VECNBI.AUX1'/
ZR (IR1)),
· Update of the vector projected residue
K
K
K
K
1
1
R
G
G
-
=
+
(ZR (IRG)).
If proc 0
O/NR
If proc 0
O/NR
If proc 0
O/NR
FETMPI
· Distribution of
G
k+1
with all proc.
In parallel
O/O
MPI_BCAST
BLAS
DNRM2
· Calculation of the standard of the projected residue
1
+
K
G
(ANORM).
O/O
FETRIN
OPTION=1
FETRIN
OPTION=1
Loop on
under-fields
concerned with
the proc. running,
BLAS DAXPY,
FETREX,
RLTFR8,
End of loop
FETMPI
Test of stop if
K
K
<
+1
G
:
· Recalculation of the residue with the vector
of interface solution
()
(
)
1
:
+
+
-
=
K
T
I
I
I
I
ground
R
F
K
R
R
(K24IRR/ZR (IRR)),
· Construction of
R
ground
complete by
reduction towards the proc. 0,
O/O
P/P
In parallel
O/NR
MPI_REDUCE+
Code_Aster
®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU
Key
:
D9.03.01-A
Page
:
12/14
Data-processing manual of Description
D9.03 booklet: -
HT-66/05/003/A
Routine
appealing
Routines
called
level 1
Routines
called
level 2
Details
Information
relative to
processors
FETRIN
OPTION=2
FETPRJ
OPTION=2
Calculation of
ground
and rebuilding of the CHAM_NO
solutions Master and slaves as in
test of stop of the table [Table 2-2].
MPI_SUM
Cf test of stop of
2.b.
FETSCA
· Scaling of the projected residue running
1
1
~
+
+
=
K
K
Ag
G
(K24IR1/ZR (IR1)).
O/O
FETPRC
Loop on
under-fields
concerned with
the proc. running,
BLAS DAXPY,
FETREX,
MRMULT,
End of loop
· Calculation of packaged projected residue
running
1
1
1
~
:
+
-
+
=
K
K
G
M
G
(K24IR2= `&&FETI.VECNBI.AUX2'/
ZR (IR2)).
P/P
FETPRC FETMPI
· Construction of
1
+
K
G
complete by
reduction towards the proc. 0.
In parallel
O/NR
MPI_REDUCE+
MPI_SUM
FETSCA
· Scaling of the projected residue
packaged running
1
1
~
+
+
=
K
K
G
With
H
(K24IR3='&&FETI.VECNBI.AUX3'/
ZR (IR3)).
If proc. 0
O/NR
FETPRJ
OPTION=1
· Calculation of projected running
1
1
~
+
+
=
K
K
H
P
H
(K24IRH/ZR (IRH)).
If proc. 0
O/NR
FETREO
BLAS DDOT,
DAXPY,
DCOPY.
· Update of the direction of descent
current
p
k+1
(ZR (IRP)) in
réorthogonalisant, or not, by report/ratio
with the preceding directions,
· Calculation of the numerator of the parameter of
descent current
1
1
1
.
:
+
+
+
=
K
K
K
NR
p
G
(ALPHAN).
If proc. 0
O/NR
If proc. 0
O/NR
FETREO FETMPI
· Distribution of ZR (IRP) to all proc.
In parallel
O/O
MPI_BCAST
Fine loops
GCPPC
Cleaning objects JEVEUX following option and
number of proc.
Table 2-3: Detailed flow chart and shaft of call in ALFETI (level 2)
Code_Aster
®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU
Key
:
D9.03.01-A
Page
:
13/14
Data-processing manual of Description
D9.03 booklet: -
HT-66/05/003/A
3
Installation of parallelism
First of all, algorithm FETI was coded into sequential then this establishment was adapted for
to support a parallelism by sending of message in MPI-1. Indeed, the priority was to measure
impact of such a solvor multidomaine on the architecture and the SD of the code, to limit them
consequences (legibility, effectiveness, maintainability) and to secure of sound correct operation on
standard cases. Moreover, for many authors, such a solvor appears often very effective (in
CPU and in occupation memory), even into sequential, when one goes up in DDL (cf [bib2]).
Then, the strategy of parallelization was as follows:
· To the main line operator calling solvor FETI (MECA_STATIQUE…), all them
processors carries out the same sequence of operations and thus know the same objects
JEVEUX: mesh, materials, fields resulting from pretreatments, SD FETI… It is relatively
sub-optimal, but taking into account the architecture of the code and its current use, it is
the only possible option. It has however the merit not to impact the code
sequential and, when the pretreatments are compared to the solvor, not very greedy in
CPU and in memory, it is also often the strategy retained by the developers of codes
parallels.
· Once in the main line operator, one will direct the operations carried out jointly
by the processors by taring the volume of data which is affected for them. And this, of
preparation of the data solvor, with numerical factorizations symbolic systems and, while passing
by the assembly runs (of the matrix and contributions to the second members) and
of course the algorithm of resolution itself. That will be done very simply, without
sending of particular message, via the object “&&FETI.LISTE.SD.MPI” which will filter the loops
on the under-fields:
CAL JEVEUO (“FETI.LISTE.SD.MPI”, “It, ILIMPI)
C 50 I=1, NBSD <boucle on the sous-domaines>
IF (ZI (ILIMPI+I) .EQ.1) THEN
…. <on carries out the continuation of expected instructions that if it
under-field is contained in the perimeter of the proc. courant>
ENDIF
50 CONTINUOUS
Concerning large usual objects JEVEUX, each proc. thus will build only them
data which it needs: Main SD SOLVEUR and those slaves depending on the perimeter of
processor running and the same thing for the NUME_DDL, the MATR_ASSE, the CHAM_NO. By
against, the data of small volume, are they calculated by all the proc. because they
often allow to direct calculation and it is of course important that all the procs. make
the same software routing.
On the other hand, the NUME_DDL of each under-field being known only of its processor of greeting, them
sendings of message are done with vectors of sizes homogeneous: the number of DDL of interface with
run of the algorithm or that of the number of total DDL at the time of the final phase of rebuilding.
The main processor manages the stages of reorthogonalisation and projection and their (potentially)
large associated objects JEVEUX.
The cost of communication is roughly speaking:
Initialization: 3 MPI_REDUCE (size nbi (nbi are the number of Lagrange of interface,
i.e. size of problem FETI to be solved)) + 4 MPI_BCAST (size nbi) + MPI_GATHERV
With each iteration of the GCPPC: 2 MPI_REDUCE (size nbi) + 2 MPI_BCAST (size nbi)
Final rebuilding: 2 MPI_REDUCE (size nbi) + 2 MPI_BCAST (size nbi) +
MPI_REDUCE (size nbddl (nbddl is the total number of unknown
(physics and late) of the problem))
Code_Aster
®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU
Key
:
D9.03.01-A
Page
:
14/14
Data-processing manual of Description
D9.03 booklet: -
HT-66/05/003/A
Rather than of the loops of communications points at points between the processors slaves and the Master
(MPI_SEND/RECV), one retained in a first approach of the collective communications
(MPI_REDUCE…) who encapsulate the first and manage in manner transparent the problems of
synchronization and of buffering. That ensures a better legibility, maintainability and portability but, has
contrario, one cannot optimize them by overlapping calculations and the communications, by limiting them
latency time or the buffering.
However, within sight of current software architecture, it seems that these optimizations are not if
promising that that, because they are very dependant on the configuration machine, of that of
network, of the card network, implementation MPI and type of problem. Gains would be without
rather doubt to seek the with dimensions one of a purely parallel implementation of the algorithm (without
vision proc. Master/proc. slaves) where the exchanges of messages would be limited to the neighbors of
under-fields and on more reduced floods of data.
4 Bibliography
[1]
O. BOITEAU: Decomposition of field and parallelism in mechanics of the structures: State
art and benchmark for an establishment reasoned in Code_Aster. Note intern
EDF-R & D HI-23/03/009 (2003).
[2]
O. BOITEAU: Management report 2004 for UMR EDF-CNRS-LaMSID. Report
intern EDF-R & D CR-I23/2005/006 (2005).