Exploring the Intermediate Representation of GCC
This week I gained a deeper and clearer understanding about some terminologies and the flow within a compiler that I have posted last week.
Basically it uses a parser to produce the syntax tree abstraction of a given source file. It will translate source codes into intermediate representation (IR) like GENERIC, GIMPLE, and RTL that are used in GCC.
GENERIC is a common representation that is able to represent programs written in all the languages supported by GCC. It is a language-independent tree structure that is generated by frontend, which is used as a “middle end” while compiling source code into executable binaries. It is produced by eliminating language-specific constructs from the parse tree that is generated from the code. It is simply used simply to provide a language-independent way of representing an entire function in trees.
A simplified subset of GENERIC for use in optimization, converted from GENERIC by “gimplifier” based on tree data structure. It is produced by simplifying address references within the code into three-address representation. At present, there are only two kinds of GIMPLE:
High level GIMPLE : what the middle-end produces when it lowers the GENERIC language that is targeted by all the language front ends.
Low level GIMPLE : obtained by linearizing all the high-level control flow structures of high level GIMPLE, including nested functions, exception handling, and loops.
SSA GIMPLE : low level GIMPLE rewritten in SSA form.
RTL (Register Transfer Language)
A very low level intermediate representation used in the backends of GCC that is very close to assembly language.
Works with IR to produce code in a computer output language
The following is the flow from source code, to compiler and generating assembly file.
Generation of dump files
I have found that we can generate output of each pass with an argument. This the general argument:
where <ir> can be
tree for intraprocedural passes on GIMPLE,
ipa for nterprocedural passes on GIMPLE,
rtl for intraprocedural passes on RTL. Whereas <passname> can be either
all to see all dumps,
ssa for static single assignment, or