This week, things are getting clearer, and the work are getting better. I began to realize my previous mistake, that the image processing pipeline core might be driven in different clock speed, compared to the CCD clock speed. The core should be able to drive as close as the CCD clock speed, as this will produce a low power design.
The Boundary Bilinear Interpolation
As I finished the circuitry of the High-Quality Linear Interpolation, and tested it with Icarus simulator, I moved on to the design of the boundary bilinear interpolation. I tried to approach the implementation by drawing out the number of combinations needed by the algorithm. The bilinear interpolation splits into 3 parts in overall, which are the first line, second line / line before the last line, and the last line. The second line bilinear interpolation is also used for the calculation of first and second pixel on the lines that uses HQLI.
The tricky part of the bilinear interpolation, which is also the most confusing part for me during the implementation, is the first pixel and second pixel computation. This part confused me several times, may be I am just not that imaginative for the pixels. Finally, I solved it by drawing the sequence of the pixel on each line one by one. As discussed previously, the RGB calculation are done as the pixels are stored into the BRAM. Thus, the pixel order would be critical for the calculation that takes place, where the calculation for first pixel would only require 3 pixels, and the calculation of the second pixel would require 4 pixels. Since I am designing the reading of the pixel from shift registers, the shift registers to be selected would be different for each pixel calculation. For example, the first pixel would require shift register value from A4 (latest pixel),A3,A2, and the second pixel would require shift register value from A4 (latest pixel),A3,A2,A1. The tricky part would be the A4,A3,A2 from first pixel calculation, will become the A3,A2,A1 for the second pixel calculation respectively.
The best part of drawing out every single circuit design, is that I could see the replication of the calculation, where same calculation are performed in parts of the HQLI algorithm, can be directly port into the output, and saved some adders.
The Output Signals
The output signals of the demosiac core are clear in my mind now. I had generated the control output signals, and I started to see things that I had missed out previously, such as the data valid signals. Previously, I was thinking that the demosaic core can be done without the bilinear interpolation, and I neglected the last line and the line before it. Thus, the circuitry worked in the way that it “forget” to interpolate the last line and the line before it. This is caused by the HQLI algorithm, which requires 5×5 pixels information, and the last line would be respect to the third line interpolation. After the third line interpolation, the HQLI ends, and the demosaic ends, which is not true. This would produce a distorted image with missing 2 lines in the middle of the image.
As the data valid signal would be the source for pixel information, the demosaic core should shift the signals by 2 lines and with the pipeline latency. I tried to find information on the generation of delay for signals, but mostly I would get information of the “real delay” of signals, where the signal information already lost after the delay. For small delay cycles of signal, shift register may be the best choice for it. However, the delay that I am seeking for is more than 2000 cycles, where I don’t think it is appropriate to implement them as shift register. Thus, I thought of implementing the delay by using BRAM, storing it and retrieving it after the delay cycles reached.
I had drafted out the final part of the demosaic core, which is the output multiplexer, to select the interpolation method, first/second/middle lines, and pixel color information. For the moment, the circuitry is quite large as it uses a lot of multiplexers.