This week I gained a deeper and clearer understanding about some terminologies and the flow within a compiler that I have posted last week.

Frontend

Basically it uses a parser to produce the syntax tree abstraction of a given source file. It will translate source codes into intermediate representation (IR) like GENERIC, GIMPLE, and RTL that are used in GCC.

GENERIC

GENERIC is a common representation that is able to represent programs written in all the languages supported by GCC. It is a language-independent tree structure that is generated by frontend, which is used as a “middle end” while compiling source code into executable binaries. It is produced by eliminating language-specific constructs from the parse tree that is generated from the code. It is simply used simply to provide a language-independent way of representing an entire function in trees.

GIMPLE

A simplified subset of GENERIC for use in optimization, converted from GENERIC by “gimplifier” based on tree data structure. It is produced by simplifying address references within the code into three-address representation. At present, there are only two kinds of GIMPLE:

  • High level GIMPLE : what the middle-end produces when it lowers the GENERIC language that is targeted by all the language front ends.

  • Low level GIMPLE : obtained by linearizing all the high-level control flow structures of high level GIMPLE, including nested functions, exception handling, and loops.

  • SSA GIMPLE : low level GIMPLE rewritten in SSA form.

RTL (Register Transfer Language)

A very low level intermediate representation used in the backends of GCC that is very close to assembly language.

Backend 

Works with IR to produce code in a computer output language

 

The following is the flow from source code, to compiler and generating assembly file.

  1. C/C++
  2. Frontend
  3. GENERIC
  4. GIMPLE

  5. RTL

  6. Assembly File

Generation of dump files

I have found that we can generate output of each pass with an argument. This the general argument:

-fdump-ir-passname

where <ir> can be tree for intraprocedural passes on GIMPLE, ipa for nterprocedural passes on GIMPLE, rtl for intraprocedural passes on RTL. Whereas <passname> can be either all to see all dumps, ssa for static single assignment, or gimple.