Difference between revisions of "Compiler"

From HPC Wiki
Jump to navigation Jump to search
 
(21 intermediate revisions by 7 users not shown)
Line 1: Line 1:
A Compiler is a computer program translating code from one language to another.
+
[[Category:HPC-User]]
 +
A Compiler is a computer program translating code from one [[Programming Languages|language]] to another.
 +
 
 +
 
 +
__TOC__
 +
 
  
 
== General ==
 
== General ==
When people write programs (code), they usually employ a text editor and a high level language like C/C++ or Fortran, to produce a code that looks somewhat like this:
+
When people write applications, they usually employ a text editor and a high level language like C/C++ or Fortran to produce code that looks somewhat like this:
  
 
[[File:Compiler_Shematic.png|thumb|1000px|Schematic of the compile process]]
 
[[File:Compiler_Shematic.png|thumb|1000px|Schematic of the compile process]]
Line 18: Line 23:
  
  
This is easy to write, understand and maintain for humans. However, since a computer only understands 0s and 1s, this can not be executed directly. A Compiler tranlates this code into a binary file, which can be executed.
+
This is easy to write, understand and maintain for humans. However, since a computer only understands 0s and 1s, this can not be executed directly. A Compiler translates this code into a binary file, which can be executed.
  
Fostering the advent of higher level programming languages, this significantly lowers the entry barrier into programming, and also facillitates the creation of more complex programs which can not (easily) be written just in terms of 0s and 1s by humans.
+
With the emergence of higher level programming languages, the entry barrier into programming is significantly lowered. This facillitates the creation of more complex programs which cannot (easily) be written just in terms of 0s and 1s by humans.
  
 
== Basic Usage ==
 
== Basic Usage ==
Line 26: Line 31:
 
  $ cc hello_world.c -o hello_world.o
 
  $ cc hello_world.c -o hello_world.o
 
where you feed it the file hello_world.c and let it create the binary output file hello_world.o, which you can then execute by calling  
 
where you feed it the file hello_world.c and let it create the binary output file hello_world.o, which you can then execute by calling  
  $ hello_world.o
+
  $ ./hello_world.o
 
producing the desired output  
 
producing the desired output  
 
  Hello, World!
 
  Hello, World!
  
In most compilers there are optimization flags like -O2 (commonly ranging from 0 to 3), where the compiler tries to figure out, what your program is doing and whether there is more efficient way of doing that. This should be used, when you start finishing development and start using your programs productively, so that it runs as fast as possible.
+
In most compilers there are optimization flags like <code>-O2</code> (commonly ranging from 0 to 3), where the compiler tries to figure out, what your program is doing and whether there is more efficient way of doing that. Not later than the development process is near completion and you begin to use your program productively, optimization should be turned on to ensure that the software runs as fast as possible.  Often a lot of money is invested into hardware, where it would be more cost efficient to invest in better software, e.g. too little effort is spend on choosing the right optimization compiler flags.
  
When compiling an application (target) from multiple files, one might need to use another program called linker to bind the different parts together or even employ a build system like [[Make]] or [[CMake]] to simplify/automate the process of compiling and linking for more complex projects.
+
In order to avoid surprises it is recommendable to use a significant optimisation level ''always'', switching to lower opetimisation levels like <code>-O0</code> in the event of an error, or to reduce the compile times on 'frequent recompilation' development stages.
 +
 
 +
When compiling an application (target) from multiple files, one might need to use another program called the linker to bind the different parts together. A handy tool to automate the process of compiling and linking is a build system, such as [[Make|make]], which manages build dependencies between different compilation units. When dealing with more complex applications, build system generators, such as [[Autotools]] or [[Cmake|CMake]], may prove valuable to handle such dependencies.
  
 
== Intel Compiler ==
 
== Intel Compiler ==
The Intel Compiler (icc) is a compiler written by Intel and optimized to take utilize the features of their microprocessors to their fullest extend, sometimes resulting in a significant performance improvement. It is usually called with (maybe you have to load the corresponding [[Modules|module]]):
+
The Intel Compiler (icc, ifort) is a compiler suite developed by Intel and optimized to utilize the features of their microprocessors to their fullest extend, sometimes resulting in a significantly higher performance compared to other compiler alternatives. It is usually called with (maybe you have to load the corresponding [[Modules|module]] beforehand):
  $ icc [files] [options]
+
  $ icc file.c [file2.c] [options]
 +
$ ifort file.f90 [file2.f90 ...] [options]
  
 
== Gnu Compiler Collection ==
 
== Gnu Compiler Collection ==
 
The Gnu Compiler Collection (gcc) is a free collection of compilers, originally written for the GNU operating system and now available on all major platforms.
 
The Gnu Compiler Collection (gcc) is a free collection of compilers, originally written for the GNU operating system and now available on all major platforms.
It is usually called with (maybe you have to load the corresponding [[Modules|module]]):
+
It is usually called with (maybe you have to load the corresponding [[Modules|module]] beforehand):
  $ gcc [files] [options]
+
  $ gcc file.c  [file2.c ...] [options]
 +
$ gfortran file.f90 [file2.f90 ...] [options]
  
 
== LLVM ==
 
== LLVM ==
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
+
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. It is usually called with (maybe you have to load the corresponding [[Modules|module]] beforehand):
 +
$ clang [files] [options]
 +
(there is no Fortran compiler available as of early 2020, however the [https://github.com/llvm/llvm-project/tree/master/flang/ ''flang''] project being promising)
 +
 
 +
== PGI ==
 +
The PGI compilers by NVIDIA are useful for GPU programming; main feature is well-thrived support for [https://www.openacc.org/ OpenACC] pragmas.
  
 
== References ==
 
== References ==
Line 54: Line 68:
  
 
[https://llvm.org/ LLVM Compiler Collection]
 
[https://llvm.org/ LLVM Compiler Collection]
 +
 +
[https://www.pgroup.com/index.htm PGI Compilers]
 +
 +
== Further compilers ==
 +
Over the years many compilers played a role on the computing world, with some to be useful in corner cases still today. To name some:
 +
- [https://www.oracle.com/tools/developerstudio/downloads/developer-studio-jsp.html Oracle Developer Studio] (previously: Sun Studio)
 +
- [https://www.nag.co.uk/content/nag-fortran-compiler NAG Fortran Compiler] (picky! but didactic)
 +
- g77 (old implementation of GNU Fortran, ''mostly'' compatible to actual compilers)
 +
- Microsoft compilers (on MS Windows)
 +
... and many other commercial compiler suites.

Latest revision as of 14:14, 27 November 2020

A Compiler is a computer program translating code from one language to another.



General

When people write applications, they usually employ a text editor and a high level language like C/C++ or Fortran to produce code that looks somewhat like this:

Schematic of the compile process
#include <stdio.h>

int main()
{
   printf("Hello, World!\n");
   return 0;
}


This is easy to write, understand and maintain for humans. However, since a computer only understands 0s and 1s, this can not be executed directly. A Compiler translates this code into a binary file, which can be executed.

With the emergence of higher level programming languages, the entry barrier into programming is significantly lowered. This facillitates the creation of more complex programs which cannot (easily) be written just in terms of 0s and 1s by humans.

Basic Usage

You usually use a compiler by calling it from the shell:

$ cc hello_world.c -o hello_world.o

where you feed it the file hello_world.c and let it create the binary output file hello_world.o, which you can then execute by calling

$ ./hello_world.o

producing the desired output

Hello, World!

In most compilers there are optimization flags like -O2 (commonly ranging from 0 to 3), where the compiler tries to figure out, what your program is doing and whether there is more efficient way of doing that. Not later than the development process is near completion and you begin to use your program productively, optimization should be turned on to ensure that the software runs as fast as possible. Often a lot of money is invested into hardware, where it would be more cost efficient to invest in better software, e.g. too little effort is spend on choosing the right optimization compiler flags.

In order to avoid surprises it is recommendable to use a significant optimisation level always, switching to lower opetimisation levels like -O0 in the event of an error, or to reduce the compile times on 'frequent recompilation' development stages.

When compiling an application (target) from multiple files, one might need to use another program called the linker to bind the different parts together. A handy tool to automate the process of compiling and linking is a build system, such as make, which manages build dependencies between different compilation units. When dealing with more complex applications, build system generators, such as Autotools or CMake, may prove valuable to handle such dependencies.

Intel Compiler

The Intel Compiler (icc, ifort) is a compiler suite developed by Intel and optimized to utilize the features of their microprocessors to their fullest extend, sometimes resulting in a significantly higher performance compared to other compiler alternatives. It is usually called with (maybe you have to load the corresponding module beforehand):

$ icc file.c [file2.c] [options]
$ ifort file.f90 [file2.f90 ...] [options]

Gnu Compiler Collection

The Gnu Compiler Collection (gcc) is a free collection of compilers, originally written for the GNU operating system and now available on all major platforms. It is usually called with (maybe you have to load the corresponding module beforehand):

$ gcc file.c  [file2.c ...] [options]
$ gfortran file.f90 [file2.f90 ...] [options]

LLVM

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. It is usually called with (maybe you have to load the corresponding module beforehand):

$ clang [files] [options]
(there is no Fortran compiler available as of early 2020, however the flang project being promising)

PGI

The PGI compilers by NVIDIA are useful for GPU programming; main feature is well-thrived support for OpenACC pragmas.

References

Video Explaining the Basic Idea of a Compiler

Intel Compilers

Gnu Compiler Collection (gcc)

LLVM Compiler Collection

PGI Compilers

Further compilers

Over the years many compilers played a role on the computing world, with some to be useful in corner cases still today. To name some:

- Oracle Developer Studio (previously: Sun Studio)
- NAG Fortran Compiler (picky! but didactic)
- g77 (old implementation of GNU Fortran, mostly compatible to actual compilers) 
- Microsoft compilers (on MS Windows)

... and many other commercial compiler suites.