System on Chip
Compared to my work the title might be a little too fancy but hopefully our switches will evolve from simple switches to a smart interconnect between the processor and all other system components.
Do you remember the second version of the accelerator switch I talked about in my previous blog post? It turned out it’s not a good choice. The reason is that in my implementation I inferred a tri-state buffer inside the design. However, tri-states should only be connected to the IO ports. Having them inside the system greatly affects it’s speed, that’s what I was told.
This week the focus was on creating switches for the IO side. The challenge for IO devices is that their number is much higher than the number of accelerators we plan to support on our system. Actually the high number of IO devices is one of our advantages over available micro-controllers. Eventually, we have to find a smart way to accommodate many IO devices without affecting the performance of the system.
My first option as usual was to start with the simplest ever switch composed of a few multiplexers. The other version was to create switches that would pass signals through each other. Thus, if the address given can’t be served by the first switch it passes the signal to the second switch and so on until the switch connected to the designated address is discovered. The issue of delay is the main challenge to the second switch.
Now that I have two simple concepts it was time to run some tests to choose the optimum switch in terms of speed and area.
The first part of testing is to test the modular switches, that is switches made from only one module. I tested switches that can accommodate 2,4,8,16 and 32 devices. The system I synthesized contained the processor, data and instruction rams, the switch being tested fully connected to master SPI devices and finally a sha1 core in order not to leave the accelerator bus floating. Next thing to do is to mix those switches together to form bigger switches and compare the results from each set up. For this comparison I’ll use the results of the synthesis report, the slices occupied and the maximum frequency the design can support.
Finally I would like to explain about the current memory map set up of our system and how are the address lines decoded. AEMB is a 32-bit processor with 30 address lines. This gives us a total address space of 4GB. Moreover, the instructions (IRAM) and data (be it program data(DRAM) or IO data) have separate buses. Hence we are leveraging on that and making the instructions memory occupy the same part of the memory as the IO devices, that is the lower 2 GB. The upper 2 GB will be occupied by the DRAM.
For the instruction bus, only the lower address lines that correspond to instruction addresses are decoded by the IRAM. For the data bus, the most significant bit is used to select between DRAM and IO devices. the DRAM receives the lowest address lines corresponding to data addresses. As for IO devices, the least significant 4 lines are supplied to IO devices for register selection. The next higher 8 bits are decoded by the IO switch to select from 256 IO device MAX. Our first prototype can’t handle more than 32 devices though.