Optimising C for T3RAS
After lots of testing i found certain variances that can help in speeding up code. These are a few simple practices.
- Know your target. If the T3RAS core you’re compiling for contains hardware features like multiplier and barrel shifting, include it when compiling your code using -mattr=barrel,mul . Both instructions take an extra cycle to execute however in most situations using them is more efficient than expanding it to a bunch of simple instructions.
- Dont use too many variables especially in the same block/object. T3RAS contains a set number of registers. If the number of entirely different variables exceed them they will be stored in memory and will have to be loaded and stored everytime there is an operation.
- Reuse variables in the same area wherever possible. This to prevent storing and loading while making the assembly code more efficient.
- Have variables perform instructions in different spots. For example integers A,B,C can be made arranged like this. A=B;C=B+5;A=A+A; If the assembly output is arranged that way than you could compile it using mattr=nohazardsolver since there would be no data hazards.
- Adding or removing branches may not affect performance. This is because the T3RAS pipeline would not need to be flushed if the delay slots filled the pipeline.