background image
Code_Aster
®
Version
8.1
Titrate:
To measure performances (CPU) on Alphaserver or Linux
Date:
02/11/05
Author (S):
Mr. ABBAS, J. PELLET
Key
:
D1.05.01-B
Page
:
1/6
Data-processing manual of Description
D1.05 booklet: -
HT-66/05/003/A
Organization (S):
EDF-R & D/AMA















Data-processing manual of Description
D1.05 booklet: -
D1.05.01 document



To measure performances (CPU) on AlphaServer
or on Linux




Summary:

There are tools making it possible to trace the times CPU used (profiling) in Code_Aster.

On AlphaServer, these tools do not require a recompiling of the Aster sources. One uses for that the tool
atom. The disadvantage of this tool (specific alphaserver) is that the instrumentation of executable involves
overcosts of execution which can be very important (up to 10 times the original cost). Under these conditions, it is
difficult to be sure relevance of the measurement.

On Linux, one uses the traditional method: one recompile all sources with the option “- pg” and one use
the tool gprof. The overcost of the instrumentation is negligible.
background image
Code_Aster
®
Version
8.1
Titrate:
To measure performances (CPU) on Alphaserver or Linux
Date:
02/11/05
Author (S):
Mr. ABBAS, J. PELLET
Key
:
D1.05.01-B
Page
:
2/6
Data-processing manual of Description
D1.05 booklet: -
HT-66/05/003/A
1 On
Alphaserver
1.1
To prepare the profiling
One works from the executable one that one intends to use to launch his Aster study:
· Executable native: To recopy the executable aster on a local index which belongs to you on
server (the executable native is in/aster/v7/NEW7/on the server and names asterd
or asteru in debug mode or not).
· Executable private: To prepare your overload like usually with ASTK or run_aster and
build your executable.
It is then necessary to modify the executable one using the tool atom.
On your index containing executable Aster which you want to profile:
atom - tool hiprof votre_executable
The program will create new executable named <votre_executable.hiprof>
1.2
To make the profiling
For ASTK or run_aster it is necessary to use the new executable one in overload. It is necessary imperatively
to modify the script of launching of Aster bus during the execution in profiling, the file
<
votre_executable.hiout
>
will be created in the temporary index of calculation. It is thus needed
to copy in the adequate index.
For ASTK:
1) Prepare the study (file, overloads of the catalogs, new executable “profiled”, bases,
time, memory and options various).
2) Add script btc in RESULT in the miter OVERLOADS.
3) Launch calculation. Calculation will not be carried out (one limps of dialog informs you) but
script (btc) will be created.
4) Modify script btc by publishing it and by adding the following line at the end:
CP votre_executable.hiout/chez_vous/votre_executable.hiout
Take guard! For any amendment in the profile of execution (in particular time and memory), it
is essential to recreate the btc and to modify it.
After execution, one finds oneself with two files:
votre_executable.hiout
votre_executable.hiprof
These two files must be in the same index. One carries out gprof then while redirecting
standard exit:
gprof votre_executable votre_executable.hiout > ResultatProfil
You have from now on a file <ResultatProfil> which is the result of the analysis.
For the possible options, to make a man gprof. Some useful options:
gprof - has
Avoid the display of the static functions, in particular the calls systems which weigh down it
file
gprof - has - F jeveuo
_
Limit display with the function designee
Caution:
For a routine FORTRAN, imperatively add a _ (underscore) at the end of the name of
routine and remove the extension .f
background image
Code_Aster
®
Version
8.1
Titrate:
To measure performances (CPU) on Alphaserver or Linux
Date:
02/11/05
Author (S):
Mr. ABBAS, J. PELLET
Key
:
D1.05.01-B
Page
:
3/6
Data-processing manual of Description
D1.05 booklet: -
HT-66/05/003/A
1.3
To strip the results of the profiling
By defect, the file is heavy. It is possible to limit the display of the infos while playing with the options
of gprof. “Times systems” are indicated in the form of a number of instructions used.
One will detail a little, while starting with the end of the file:
***************************************************************************
Index by function name
[401] PyArg_Parse [591] cftabl_ [1000] proc_at_0x1213acb50
[212] PyArg_ParseTuple [84] cftyli_ [660] proc_at_0x1213ad470
[1137] PyArg_ParseTupleAnd [310] cgmacy_ [453] proc_at_0x1213ad560
[1605] PyBuffer_FromObject [79] charme_ [680] proc_at_0x1213aeac0
[1256] PyCFunction_Fini [476] chlici_ [1221] proc_at_0x1213aedc0
[531] PyCFunction_New [190] chloet_ [217] proc_at_0x1213b 18e0
[1549] PyCObject_AsVoidPtr [226] chmano_ [629] proc_at_0x1213b1e00Y
Each function called during the execution is identified by a number between hook.
Just with the top:
***************************************************************************
granularity: instructions; units: inst' S; total: 201924201580.70 inst' S
<A> <B> <C> <D> <E> <F> <G>
49.6 100384307222 100384307222 161 623505013 623596299 tldlr8_ [16]
31.0 163144941823 62760634601 506 124032874 124101882 rldlr8_ [17]
This table summarizes the most frequent calls.
COLUMN <A>: percentage of the number of instructions carried out by this function compared to
total of the execution.
COLUMN <B>: a number of instructions cumulated by this function and those which precede.
COLUMN <C>: a number of instructions for this function.
COLUMN <D>: a many calls have this function
COLUMN <E>: relationship between the column <B> and the column <D> (an average number of instructions by
call of the function)
COLUMN <F>: numbers training aid by call of the function and of its
descendants.
COLUMN <G>: name of the function and its reference number (between hooks).
In this example, the function tldlr8 took 49.4% of the total of calculation while being called 161 times.
***************************************************************************
Lastly, at the beginning of the file, we have the shaft of complete call. It will be sorted by command of call (one
start with the hand and one goes down) or by a function (see the options of gprof).
***************************************************************************
Let us take the example of tldlr8:
<A> <B> <C> <D> <E> <F>
100263313681.76 14679301.29 161/161 tldlgg_ [15]
[16] 49.7 100263313681.76 14679301.29 161 tldlr8_ [16]
3129121.03 6207534.02 4485/30537 __upcUpcall [352]
35974.59 2749927.50 522/195235 jelibe_ [65]
192341.36 1770419.18 1005/775659 jeveuo_ [56]
47302.73 140745.02 161/202579 jedema_ [102]
18938.92 126525.05 322/63148 jeexin_ [196]
27722.26 85430.33 94/49118 jeecra_ [154]
17033.41 67779.29 94/13206 jecreo_ [257]
45068.75 84.88 1044/1075446 jexnum_ [163]
13618.68 2023.63 161/202581 jemarq_ [205]
1710.66 0.00 161/3481 infniv_ [853]
background image
Code_Aster
®
Version
8.1
Titrate:
To measure performances (CPU) on Alphaserver or Linux
Date:
02/11/05
Author (S):
Mr. ABBAS, J. PELLET
Key
:
D1.05.01-B
Page
:
4/6
Data-processing manual of Description
D1.05 booklet: -
HT-66/05/003/A
One identifies the instruction of the shaft of call by the number between hooks on the left. Here, the number [16]
indicate the function tldlr8_ (as indicated in the end of the file for example).
It is the function-reference (the node of the shaft).
The lines with the top are the appealing ones of this function (they are the function-parents), those in
below are the functions called (they are the function-children).
Each function has two main digits: the number of instructions carried out in itself
(“final” instruction of FORTRAN) and numbers it instructions carried out in the functions
children.
Function-parent
Function-parent

Function-reference
Function-child
Function-child
Function-child
Function-child
For the function-reference:
COLUMN <A>: number of identification of the function-reference.
COLUMN
<B>: figure 49.7 is the percentage of the number of instructions carried out by this
function-reference compared to the total of the execution (idem preceding table)
COLUMN <C>: a number of instructions for the function-reference itself.
COLUMN <D>: a number of instructions for the function-children of the function-reference.
COLUMN <E>: a number of times or the function was called
COLUMN <F>: name of the function-reference
For the function-parents and the function-children:
COLUMN <A>: vacuum
COLUMN <B>: vacuum
COLUMN <C>: a number of instructions for the function itself.
COLUMN <D>: a number of instructions for the descendants of the function
COLUMN <E>: give two digits has/B whose direction varies according to the type of function (relative or
child compared to the function reference):
· For the function-parents (above the function reference) has/b:
<a> is the number of times where the function-reference was called by this function-parent by
report/ratio with the total number <b> of calls of the function-reference.
· For the function-children (below the function reference) has/b:
<a> is the number of times where the function-child was called by the function-reference by
report/ratio with the total number <b> of calls of the function-child.
COLUMN <F>: name of the function
Note:
· If the number of instructions for the descendants of a function is worth zero, it is that the function
considered no other calls any. One is “with the end” of the shaft, it has only calls there
Basic FORTRAN in the function. (it is the case of infniv for example)
· For a given function-reference, if one makes the sum of the <a> in the column <E> of
functions parents, one obtains the total number of calls of the function reference.
· For a given function-reference, if one makes the sum of the columns <C> and <D> of its
function-children, one obtains the figure of the column <D> of the function-reference.
Analyze example
In the example presented, the function tldlr8 is expensive since with it-only, it represents close to
half of the total number of instructions of the execution. It is also seen that they are its clean
instructions which take time and not the call to his/her function-children (the relationship between the two
reached 1000). As only the function tldlgg calls tldlr8, it is necessary to look at the shaft of call for
this function. It is seen whereas it is the algorithm of contact/friction (fropgd) which is more
glouton (the 2/3 of the calls to tldlgg are made by the algorithm of contact).
background image
Code_Aster
®
Version
8.1
Titrate:
To measure performances (CPU) on Alphaserver or Linux
Date:
02/11/05
Author (S):
Mr. ABBAS, J. PELLET
Key
:
D1.05.01-B
Page
:
5/6
Data-processing manual of Description
D1.05 booklet: -
HT-66/05/003/A
2 On
Linux
2.1 Instrumentation
with
f77 - pg (or DC - pg)
On the Linux machine/Rock'n'rolls clpaster (cluster of PC of department AMA), the problem of
integral recompiling of the sources is less crucial than on the alphaserver: recompiler can
entirely Aster in less than 30 minutes “elapse”.
To carry out this recompiling with Astk, it is necessary:
· “to overload” all the sources (F77 and C). To save time, one can concaténer them
F77 sources in “packages” (300 routines for example).
· to modify the file “config.txt” to add the option” - pg " on the 5 following lines:
­ OPTL | f90 | ? | - v - pg
­ OPTC_D | DC | ? | - C - G - pg - DP_LINUX
­ OPTC_O | DC | ? | - C - pg - DP_LINUX
­ OPTF_D | f90 | ? | - C - G - pg - I/opt/mpich2- 1.0.1/include
­ OPTF_O | f90 | ? | - C - O2 - pg - I/opt/mpich2- 1.0.1/include
The file config.txt thus modified makes it possible to instrument the code in mode “debug” and “nodebug”.
The mode “nodebug” is a priori preferable to measure “the true” performances of the code. In
revenge, the mode “debug” is necessary if one wants to know the most consuming lines.
I unfortunately observed an unexplainable problem in mode “debug”: the result of the profiling
indicated links of incoming call routines which did not exist! One can however hope that this
fault entirely does not invalidate the remainder of the measurement.
As example, I profiled the test ssnv506c and I obtained the following total results:
·
in nodebug mode without instrumentation: 138s
·
in nodebug mode with instrumentation: 139s
·
in debug mode without instrumentation
: 218s
·
in debug mode with instrumentation
: 228s
It is noted that the instrumentation has a negligible cost CPU.

2.2
Execution of the Code instrumented with Astk
Once this made instrumentation, it should be carried out the study that one wants “to profile” with the executable one
that one has just produced. The problem is that the execution of the study produces a file (called
gmon.out) in the temporary index of execution. This file is thus lost at the end of the execution if
one does not take precautions.
To preserve the invaluable file gmon.out, it is necessary to use Astk in interactive and to click the button
“to launch pre” (instead of conventional “the throw run”). This option of Astk makes it possible to prepare
environment of execution. One places oneself then in the prepared index and one “launches” Aster
manually. It is about the same “easy way” as for the use of a debugguor.
background image
Code_Aster
®
Version
8.1
Titrate:
To measure performances (CPU) on Alphaserver or Linux
Date:
02/11/05
Author (S):
Mr. ABBAS, J. PELLET
Key
:
D1.05.01-B
Page
:
6/6
Data-processing manual of Description
D1.05 booklet: -
HT-66/05/003/A
2.3
Analysis of the results
Once the study carried out and the file “gmon.out” recovered, one can analyze this file with
order:
gprof mon_executable gmon.out > listing
The interpretation of the file obtained (listing) is the same one as that described with [§1.3]. Excellent
document describing all the process of profiling is that written by Jay Fenlason and Richard Stallman:
“Gnu gprof The GNU to profile”. One easily finds it on the Web.
Note:
Even if one recompile all sources of Aster, “depth” of the analysis of the performances
stop with the libraries which one uses with the edition of the links and which were not compiled with
“­ pg”. It is for example the case of routines BLAS. The time spent in these libraries
cannot be attached to the routines of Aster which call them. This defect can be important, by
example, if one wants to measure the performances of solveurs MUMPS or MULT_FRONT because one
most of spent time is in routines BLAS.