It is quite amazing how many times I discover myself to have a faulty or shallow understanding about the technology I am working on. Normally those discoveries occur as I go more in depth into the technology at hand. Those two lines are necessary to make up for any faulty explanation that I’ve given about AEMB which I am beginning to realize as I’m looking at the threading model. However, I won’t go on and list those mistakes, admitting them is more than enough!
In this post, I’ll explain several findings about the threading model of AEMB, it’s program flow and the changes necessary when I change the threading model.
Threading behaviour
In AEMB, each thread has a separate set of register file and hence no dependencies are expected to occur between threads. More importantly, AEMB has two modes of accessing the threads. The first is to issue the same instruction twice in consecutive clock cycles, that is, to use the same instruction to write to both threads. This is used in the beginning of the program before the flow is split between the two threads. I think this is because in the beginning of the program, the setup required for the data memory and the register files need to take place for both threads. The second mode takes place after the program is split into two threads and instructions are interleaved in a fine manner.
The splitting of threads takes place in the program init function and it depends on the 4th bit in the MSR; the Mutex bit.
The following picture shows the moment at which the program first splits into two threads
The way the threads are utilized as explained before makes it not necessary for the current AEMB to resolve data dependencies between back to back instructions as no back to back instructions will exist from the same thread.
Thus, there is some changes in AEMB that need to take place in order for it to properly function with a coarse grained threading model.
First, data dependencies need to be resolved for back to back instructions, instructions with one gap between them and instructions with two gaps between them. The forwarding unit for one gap data dependencies is already in use in the current AEMB. A similar forwarding ability for back to back instructions need to be added in. I will only forward the data coming from the ALU. Other instructions that can’t be forwarded such as multiplication and memory access instructions will be resolved by inserting bubbles into the pipeline. As for two gap dependencies, in this case the required data is at the write back stage but it will be written on the same clock edge that it is supposed to go into the execution stage. Either forwarding or different clock edge read/write register file will be used. Personally I prefer forwarding to avoid any design complications that might arise from accessing the register file at different clock edges.
Second, the PC register might need some editing in order to be able to maintain the address of the coming instruction from the next thread.
The only thing remaining for me to do the edits is to figure out where exactly in the program is the content of each thread decided. I know where does the threading split occurs but I don’t know how does the program determine which part of the program is which thread. I will try to figure it out in a limited time and If I fail in doing so, I will go ahead and change the threading model and do all necessary changes before taking my time to understand how the C++ code works for AEMB.
0 Comments