While i was successful in making sure the backend appeared with the right name and features that were needed, there were still many shortcomings that needed to be addressed.
After a while of copying, modifying and adapting code, I was finally able to change the names and variants. Instead of differentiating the variants from the pipeline, i removed the 3 staged pipeline and made copies of the 5 staged pipeline for different threaded variants. I found that one way i could optimise for the processor since they all had the same pipeline was to assume that each thread only takes up part of the pipeline. For example for a 2-threaded variant, without including the instruction at the top of the pipeline, the pipeline appears like a 3 stage pipeline despite being a 5 stage pipeline. While defining the latencies at each stage for instructions/sections, it is easier to define the other cycle latencies in relative as if it were a 3 stage pipeline processor, such as results of arithmetic operations available after 1 cycle instead of 2 which was default on the backend. However, despite the similarities, the pipelines must be properly defined.
The only problem now would be changing the Intrinsics Info and ISelLowering inside the ported backend to support the 4 threaded variant since branching does not require delay slots and the branch instructions are either for example ‘bri’ for branch without delay or ‘brid’ for branch with delay. On a 4 threaded variant with the same pipeline, the branch would appear to execute within the next cycle for that thread. Currently all branch instructions uses delay slots automatically.
I also found that there was no support for multiple delay slots in the delayslotfiller.cpp file. On the 1 threaded variant, a branch would need 3 instructions in its delay slot in order to work without delay. However there is no support for this in the backend and i found that this can be supported in the backend by modifying the code and including some sort of recursion in delayslotfiller.cpp.
More on this later after it has been implemented and tested.
0 Comments