Difference between revisions of "Gprof Tutorial"

From HPC Wiki
Jump to navigation Jump to search
m (Remove initial blank lines and infobox to HPC.NRW tutorials)
m
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:Tutorials]] [[Category:Basics]]<nowiki />
+
[[Category:Tutorials]] [[Category:HPC-Developer]]<nowiki />
 
[[Category:Tutorials | Gprof Tutorial]]<nowiki />
 
[[Category:Tutorials | Gprof Tutorial]]<nowiki />
  
Gprof is a free and easy-to-use profiler.
+
This tutorial deals with the topic of applications performance analysis with the GNU profiler Gprof. Profiling applications gives valuable insights into the program structure and exposes performance bottlenecks, which point to sections of the code where optimizations are most effective.
Profiling applications gives valuable insights into the program structure.
 
It exposes performance bottlenecks and points to sections of the code where optimization is most effective.
 
  
This tutorial covers the necessary basics to get started with gprof.
+
The tutorial covers all necessary basics to get started with Gprof: it shows how to instrument applications, how to generate performance information for an application run and how to evaluate the results. In addition, it explains how to visualize the application structure using call graphs and how to annotate the application's source code with runtime information. Three real-world examples from the areas of biology, computer science and mechanical engineering demonstrate that this works with different programming languages (C/C++, Fortran), different compilers (GNU, Intel) and even parallel applications (threads, MPI).
  
 
__TOC__
 
__TOC__
Line 17: Line 15:
  
 
{{hidden begin  
 
{{hidden begin  
|title = 1. What lanuages can Gprof profile?
+
|title = 1. What languages can Gprof profile?
 
}}
 
}}
 
<quiz display=simple>
 
<quiz display=simple>
Line 32: Line 30:
  
 
{{hidden begin  
 
{{hidden begin  
|title = 2. How does gprof profiles an application?
+
|title = 2. How does Gprof generate a performance profile of an application?
 
}}
 
}}
 
<quiz display=simple>
 
<quiz display=simple>
 
{
 
{
 
|type="()"}
 
|type="()"}
+ By instrumenting the application during compilation.
+
+ by instrumenting the application during compilation
 
|| True
 
|| True
- Through static analysis of the source code
+
- through static analysis of the source code
 +
||
 +
- by means of hardware performance counters
 
||
 
||
 
</quiz>
 
</quiz>
Line 45: Line 45:
  
 
{{hidden begin  
 
{{hidden begin  
|title = 3. What compiler flag is used to instrument the application?
+
|title = 3. What compiler/linker flag is used to instrument the application?
 
}}
 
}}
 
<quiz display=simple>
 
<quiz display=simple>
 
{
 
{
 
|type="()"}
 
|type="()"}
+ <code>-pg</code>
+
+ <code>-pg</code>
 
|| True
 
|| True
- <code>-pig</code>
+
- <code>-pig</code>
 
||
 
||
- <code>--profile</code>
+
- <code>--profile</code>
 
||
 
||
 
</quiz>
 
</quiz>
Line 60: Line 60:
  
 
{{hidden begin  
 
{{hidden begin  
|title = 4. How should the parameters be when running the instrumented application?
+
|title = 4. Which compilers support Gprof?
 
}}
 
}}
 
<quiz display=simple>
 
<quiz display=simple>
 
{
 
{
 
|type="()"}
 
|type="()"}
- Simple and understandable
+
- only the commercial Intel compilers
 +
||
 +
- only open-source compilers (e.g., from GNU)
 
||
 
||
+ Representative of the usual workload
+
+ many different compilers (e.g., from GNU and Intel)
 
|| True
 
|| True
- Covering edge cases  
+
</quiz>
 +
{{hidden end}}
 +
 
 +
{{hidden begin
 +
|title = 5. How should the input parameters be when running the instrumented application?
 +
}}
 +
<quiz display=simple>
 +
{
 +
|type="()"}
 +
- simple and understandable
 +
||
 +
+ representative of a usual workload
 +
|| True
 +
- covering edge cases  
 
||
 
||
 
</quiz>
 
</quiz>
Line 75: Line 90:
  
 
{{hidden begin  
 
{{hidden begin  
|title = 5. What is a call graph?
+
|title = 6. Which applications can be analyzed with Gprof?
 
}}
 
}}
 
<quiz display=simple>
 
<quiz display=simple>
 
{
 
{
 
|type="()"}
 
|type="()"}
- An android app to show incoming callers
+
- only small examples
 +
||
 +
- up to medium-sized applications with a running time below ~1 hour
 +
||
 +
+ all, even large real-world examples with huge running times
 +
|| True
 +
</quiz>
 +
{{hidden end}}
 +
 
 +
{{hidden begin
 +
|title = 7. What is a call graph?
 +
}}
 +
<quiz display=simple>
 +
{
 +
|type="()"}
 +
- an android app to show incoming callers
 
||  
 
||  
+ A hierarchy diagram of function calls in a given profile
+
+ a hierarchy diagram of function calls in a given profile
 
|| True
 
|| True
- Instructions of how to call for help during emergencies
+
- instructions of how to call for help during emergencies
 
||  
 
||  
 
</quiz>
 
</quiz>
Line 90: Line 120:
  
 
{{hidden begin  
 
{{hidden begin  
|title = 6. How do you generate a callgraph of a gprof profile?
+
|title = 8. How do you generate a call graph of a Gprof profile?
 
}}
 
}}
 
<quiz display=simple>
 
<quiz display=simple>
Line 97: Line 127:
 
- gprof --call-graph  
 
- gprof --call-graph  
 
||  
 
||  
- gprof
+
- gprof-call-graph
 
||  
 
||  
 
+ gprof --graph   
 
+ gprof --graph   
Line 105: Line 135:
  
 
{{hidden begin  
 
{{hidden begin  
|title = 7. What is gprof2dot?
+
|title = 9. What is gprof2dot?
 
}}
 
}}
 
<quiz display=simple>
 
<quiz display=simple>
 
{
 
{
 
|type="()"}
 
|type="()"}
+ Third party script for call graph visualization via the "dot" library  
+
+ third-party script for call graph visualization via the "dot" library  
 
|| True
 
|| True
 
- Gprof feature to export profiles as a pdf  
 
- Gprof feature to export profiles as a pdf  
 
||  
 
||  
- A fork of the beta version of gprof2
+
- a fork of the beta version of Gprof2
 
||  
 
||  
 
</quiz>
 
</quiz>
Line 120: Line 150:
  
 
{{hidden begin  
 
{{hidden begin  
|title = 8. Does gprof work with MPI applications?
+
|title = 10. Does Gprof work with parallel applications?
 
}}
 
}}
 
<quiz display=simple>
 
<quiz display=simple>
 
{
 
{
 
|type="()"}
 
|type="()"}
- No, gprof only works with sequential applications  
+
- no, Gprof only works with sequential applications  
 
||  
 
||  
- Yes, parallel profiling is the main use case of gprof
+
+ yes, but Gprof cannot differentiate between individual threads or processes
 +
|| True
 +
- yes, parallel profiling is the main use-case of Gprof
 
||  
 
||  
+ Yes, but gprof cannot differentiate between individual threads/processes
 
|| True
 
 
</quiz>
 
</quiz>
 
{{hidden end}}
 
{{hidden end}}
  
 
{{hidden begin  
 
{{hidden begin  
|title = 9. How much overhead does gprof produce?
+
|title = 11. How much runtime overhead does Gprof produce?
 
}}
 
}}
 
<quiz display=simple>
 
<quiz display=simple>
 
{
 
{
 
|type="()"}
 
|type="()"}
- None
+
- none
 
||  
 
||  
+ Little
+
+ little
 
|| True
 
|| True
- Much
+
- much
 
||  
 
||  
 
</quiz>
 
</quiz>
 
{{hidden end}}
 
{{hidden end}}

Latest revision as of 17:48, 3 December 2020

This tutorial deals with the topic of applications performance analysis with the GNU profiler Gprof. Profiling applications gives valuable insights into the program structure and exposes performance bottlenecks, which point to sections of the code where optimizations are most effective.

The tutorial covers all necessary basics to get started with Gprof: it shows how to instrument applications, how to generate performance information for an application run and how to evaluate the results. In addition, it explains how to visualize the application structure using call graphs and how to annotate the application's source code with runtime information. Three real-world examples from the areas of biology, computer science and mechanical engineering demonstrate that this works with different programming languages (C/C++, Fortran), different compilers (GNU, Intel) and even parallel applications (threads, MPI).


( Slides as pdf)

Quiz

1. What languages can Gprof profile?

Python, Java, Julia
C/C++, Fortran, Pascal
Haskell, Cobol, Whitespace

2. How does Gprof generate a performance profile of an application?

by instrumenting the application during compilation
through static analysis of the source code
by means of hardware performance counters

3. What compiler/linker flag is used to instrument the application?

-pg
-pig
--profile

4. Which compilers support Gprof?

only the commercial Intel compilers
only open-source compilers (e.g., from GNU)
many different compilers (e.g., from GNU and Intel)

5. How should the input parameters be when running the instrumented application?

simple and understandable
representative of a usual workload
covering edge cases

6. Which applications can be analyzed with Gprof?

only small examples
up to medium-sized applications with a running time below ~1 hour
all, even large real-world examples with huge running times

7. What is a call graph?

an android app to show incoming callers
a hierarchy diagram of function calls in a given profile
instructions of how to call for help during emergencies

8. How do you generate a call graph of a Gprof profile?

gprof --call-graph
gprof-call-graph
gprof --graph

9. What is gprof2dot?

third-party script for call graph visualization via the "dot" library
Gprof feature to export profiles as a pdf
a fork of the beta version of Gprof2

10. Does Gprof work with parallel applications?

no, Gprof only works with sequential applications
yes, but Gprof cannot differentiate between individual threads or processes
yes, parallel profiling is the main use-case of Gprof

11. How much runtime overhead does Gprof produce?

none
little
much