The week was spent in continuing the ground work done on the Synchronous USART to optimize the Reading process which will subsequently program the FPGA. After solving the problem faced in the week before, time was spent in trying to get the SD card to read the bitstream data as fast as possible. A lot of time was also spent to diagnosing the current writing stack to try and find the bottleneck for it and speeding up the stack as much as possible. A footprint mechanism was also added to the MBR to signify whether the SD card being used needs to be formatted for the usable format or not.
Debugging the Synchronous UART
The week before, the UART was unable to function at full speed in synchronous mode. During the week, this was diagnosed and after some debugging, it was found that when the SPBRG is set to 0, the program gets stuck in an infinite loop because the UART returns a ‘busy’ status the whole time. Taking away the recommended “while(Busy1USART())” flag solves the problem of the infinite loop, however, timing the reading process still shows that it takes the PIC about 4 seconds to read the entire bitstream.
Imprinting the MBR with the “AESTE” footprint
One more important addition to the current stack was to be able to detect whether the SD card being used with the board has been formatted correctly to work with the PIC as intended. It was decided after discussions with the supervisor that a footprint should be included into the MBR of the SD card. So when the SD card is initialized, the PIC will first check for the footprint on the SD card’s MBR and if the footprint is not found, the PIC will format the SD card and then condition the MBR to support the format used for this application. In the same process, the footprint will be added with an initial Start Read Address and an appropriate Start Write Address. This implementation was easy to make as previous experience in editing the MBR helped easy apply the footprint to the MBR. The space reserved for the information for Partition 2 was used for the footprint.
Another change that was made was the addition of a Start Write Address on the MBR. This allows the PIC to simply read out the addresses and perform the tasks assigned rather than the current method of having the PIC decide where to start writing from based on the Start Read Address. Consequently, once the upload is completed, the Read and Write address swap positions, ensuring a fully functional Ping-Pong system. The following is an image of the implementation of the footprint and the corresponding addresses on the MBR of the SD card:
Optimization of the Reading process
Having gotten the synchronous mode to function properly, attention was given to the reading stack. The current stack works in the following way:
It was clear that the slow-down was caused by the loop back mechanism which has to go through many conditions in the current stack to get back to reading the next 512 bytes. After some more research on the SD card multi block reading process from the SD card Physical Layer Documentations, it was found that it was possible to read in a steady stream of bytes rather than in multiples of 512 byte blocks (the same case as suggested by the supervisor). However, some considerations need to be made. Firstly, when the CRC mode is activated on the SD card during the initialization, the Multi-Block read is accompanied by 2 bytes of CRC after every block boundary. Furthermore, two more “busy (0xFF)” bytes are sent by the SD card before the data from the next block is read. So to implement a faster read process, a simple for loop was used using the file size(obtained from the first 4 bytes of the bitstream) as the number of bytes remaining and transmitting every byte being read out of the SD card. This simplifies the reading loop and ensures that the UART does not wait for the next block to be prepared before transmitting the data. The issue with this approach is that the SD card will still issue 2 bytes of CRC after every 512 bytes (when the block boundary is reached). So a secondary condition to perform 4 dummy reads (2 CRC bytes + 2 Busy bytes) was applied for every 512 bytes read. The following is an illustration of the flow of the program when the read function is initiated:
Having implemented these conditions and changes, the read times were tested and the results were very promising. Initially, the read was completed within 0.6 seconds. Using the hardware debugger on the PICKit3, most of the data being transmitted was verified to be accurate.
Optimization of the Writing Process
Currently, the PIC was only able to write the whole bitstream with Base64 decoding onto the SD card in about 52 seconds. This was definitely too slow and needed to be improved upon. Upon advice from the supervisor, the stack was analysed to try to find the bottleneck for the whole process to pinpoint where improvements can be made. So, the whole process was dissected. Initially, the network transfer speed for the whole bitstream was tested. It was found that without the use of the Base64decoder and without writing any data onto the SD card, the process was completed within 23 seconds, for a bitstream of a file size of 350kB (460kB in Base64). This gives a network transfer rate of about 20kB/sec which means that under the current network stack set-up, this is the optimum performance that can be achieved. The second part tested was with the inclusion of the Base64Decoder and the results were quite surprising. With only the network transfer and Base64 decoding, the whole process took 47 seconds to complete. Thus it was clear that the inefficiency of the Base64decoder being used was the bottleneck for the process under the current network stack set-up.
The current function written by my colleague for the Base64Decoder was analyzed. The function used made use of some pointer mathematics and a look-up table to decode the data. Moreover, the code implemented catered to many different cases and character conditions in the encoded data such as line-breaks; white-spaces and padding characters. For the purpose of our application, all these conditions can be ignored. So after some research an improved implementation of the Base64Decoder was applied.
The new implementation takes in the same arguments and returns the same buffer with 3 bytes as output. But instead of using the table and complicated pointer mathematics, it simply takes in the Base64 byte, converts it to 6 bits and writes the bits to the correct destination buffer. The accumulated buffer is then returned by the function. This function ignores all non-base64 characters and carries on, this it processes all the data faster.
Using this new optimized function the upload time was tested and found to be about 37 seconds consistently for the same bitstream. This is an improvement of 15 seconds which is 28% improvement on the previous writing time.
Some good progress was made this week. The read speed was optimized near to what should be the optimal possible speed using the USART on the PIC. However, one point of uncertainty occurs when the dummy reads are done, as it is uncertain how the UART behaves in that time. However, this can be verified in the weeks to come. The improvement to the writing speed was also a good step forward, but only a small one. It is hoped that more tweaks such as the Base64Decoder can be made to improve the speed further and achieve the optimal speeds.