Finally, this week I finished on the schematic of the whole demosaic core, described it in Verilog, and simulated it using the Icarus Verilog simulator. In the path of implementation, many problems occurs and it was really interesting to tackle them.
The Edge of Bilinear Interpolation
Bilinear interpolation is a simple and easy method to interpolate the Bayer pixel, which uses side (neighbor) pixels to perform the interpolation. Initially, I was planning to build it with a switching in interpolation algorithm using multiplexer. I found out that these will actually cost a lot of multiplexer to be used, approximately 13 multiplexer for each color. Where 3 pixels color would actually takes up to 13 x 3 = 39 multiplexers, just to implement the bilinear interpolation algorithm combined with the HQLI. That is pretty wasteful for resources and inefficient, because the pipeline that would need to be done would waste more resources too. The consideration of pipelining is critical, as for each stage of pipeline it would cost 8 register to be used, where a 6 stage pipeline would take up to 48 registers for each pixel.
Considering these inefficient way of implementation, I start to look through the algorithm again.
As shown from figure above, the bilinear interpolation algorithm of a quadrant image has 12 combinations and can be divided into 6 different patterns of interpolation (including first line and last line, and other boundaries), where it would be a 6 different circuitry to be implemented in hardware. Just like what my supervisor said to me previously, 6 different patterns of circuitry that would only be used by one or two times in the whole circuit, is extremely inefficient. I actually agreed to this statement, and it triggers my thinking on the implementation when I was drawing the huge multiplexers design.
The beauty of the bilinear interpolation is that it would always getting the NEIGHBOUR pixels only, which means that whenever pixel that you select, the interpolation pattern would be similar of using its neighbour pixels. Thus, I thought of minimize the 6 patterns, the first line and the final line of the algorithm can actually be combined. Consider the diagram below, we have 5 sets of shift register representing 5 lines respectively, where line 1 represent the first set of shift register and line 5 represent the last set of shift register. The first line and last line interpolation actually uses “line 3 and line 4” and “line 2 and line 3” respectively.
These two line are in common of their respective line is on the set C shift register, where set D and set B are the neighbor pixels for first line and last line respectively. Thus, a multiplexer can be used on to select set D or set B, for first line or last line interpolation, which saved the implementation of a new pattern circuitry. The first pixel on the boundary side can also be implemented using the same method, but it would be using more multiplexer to implement such logic. After several drafting and comparison, I found that such implementation is not needed as the circuitry for it is repetitive and I could directly plug the output from the repetitive circuitry into it.
Combine Bilinear Interpolation with High-Quality Linear Interpolation
In order to combine the HQLI with bilinear interpolation, I noticed that the HQLI algorithm that had been designed in the first place have a good advantage in combining the bilinear interpolation. Besides, the HQLI algorithm had sorted out that which kind of pixel (R,G,B) that could be ported into the algorithm, and most of the bilinear interpolation is part of the original HQLI pattern. Thus, knowing the pixel that could be input into the algorithm, I did a multiplexing in the selection of algorithm and link it to the bilinear interpolation, which saved the implementation of bilinear pattern, as it would be part of the HQLI pattern circuitry. The advantage of such implementation also allows the whole image to interpolate only with bilinear interpolation, which one can examine the difference of image quality with bilinear interpolation or with HQLI.
With the combined circuitry, it actually reduced the original circuitry into total of 17 multiplexers for all three R,G and B pixels, and with a 5 stage pipeline.
The Two’s Complement Subtraction
As I implement the circuitry that had been drew using Verilog, and simulate it using the Icarus simulator, I found that there are something wrong in the calculation. The calculation answer was in two’s complement form when it is a negative number. This should never happen because pixel values should be absolute numbers. I made my approach into Xilinx adders and subtractors documentation, where they specify that the second operand would be treated as two complement. Thus, a two complement subtractor can be implemented by using the following code :
reg [2:0] A, B;
wire [3:0] R = A – B;
where R would be representing the signed bit, to link into the reset port of register.
The Optimization of Demosaic Core
After testing with the RGGB pattern of the demosiac core, I synthesized it using the Xilinx ISE, and it is using 819 FFs. It actually meets the expectation as it has a lower number of FFs compared to the example I provided in the previous blog. As I looked through the circuit, I think of several optimization can still be done to improve the circuitry, which could reduces the number of Flip-Flops to be used.
- The Division before additions : Division which is also using wire shifting can be done prior before the additions, so that the pipelined shift register number, adders and subtractors size could be reduced. Though the result may vary a little with the actual one.
- The Generation of Demosaic Control signals : The control signals that are generated by the internal circuitry to control the demosaic sequence, can still be optimized.
Since the simulation had done, and I had verified the calculated result for RGGB combination, I am really looking forward for implementing it on the FPGA board next week. 😀