The T3RAS backend in llvm is based on the microblaze 6.3 architecture set but without hardware support for floating points and without hardware division. The T3RAS backend contains 3 variants (T3RAS1T, T3RAS2T, T3RAS4T) which is described below.

To list all variants and features of T3RAS, simply run:
llvm-as < /dev/null | llc -march=T3RAS -mattr=help

which would show:

T3RAS1T - Select the T3RAS1T processor.
T3RAS2T - Select the T3RAS2T processor.
T3RAS4T - Select the T3RAS4T processor.

Available features for this target:

1-delayslot - enables 1 delay slot per branch -default for T3RAS2T.
2-delayslot - enables 2 delay slot per branch.
3-delayslot - enables 3 delay slot per branch -default for T3RAS.
barrel - Implements barrel shifter.
hasnodelay - disables branching delay slots -default for T3RAS4T.
mul - Implements hardware multiplier.

Description:

  • T3RAS1T– mono-threaded variant with a simple 5 stage pipeline. Default CPU if no CPU was chosen.
  • T3RAS2T– dual-threaded variant with a simple 5 stage pipeline
  • T3RAS4T– quad-threaded variant with a simple 5 stage pipeline

1-delayslot,2-delayslot,3-delayslot controls the number of delay slots the instructions will fill. This does not change anything on the hardware and should not be touched unless you know what you are doing.

hasnodelay disables delay slots so that the compiler will generate code without enabling delay slots. Using this on a variant other than T3RAS4T will severly decrease performance when running the generated code.

barrel implements a barrel shifter for all bit shifting tasks. If not used the compiler will generate a bunch of other instructions that will perform the same thing.

mul implements hardware multiplier for all multiplication (but not division) tasks. If not used the compiler will create a branch with a bunch of instructions that will perform the multiplication based on simpler operations.

The T3RAS architecture currently has no hardware division.

Currently in order to prevent hazards, nop is inserted. Thus the performance has been decreased. The performance is compared to what the code should look like in an efficient manner by comparing the number of assembly instructions. The current expected performance is:

T3RAS1T:10%
T3RAS2T:25%
T3RAS4T:60%

The performance will be improved in the future as the backend is improved.


0 Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.