Frustrating… that basically sums up whole previous week.
Having all minor modules ready I started building the main part which is shortly speaking quite elaborate state machine using those modules in correct order. This turned out to be quite arduous work itself as one needs to be very careful when to perform specific operations. In addition, I had to basically reinvent the whole algorithm because those I found are either for software (so purely sequential) or use different hardware setup (more multipliers/adders/squarers (is that even a word?)).
Everything was going great until somewhere around the end of the week I decided I’m using too many registers and in too complicated way. This is very serious problem in this kind of design. Just imagine a single 163 bit register to which 2 sources want to write. This creates a 163 bit multiplexer. Now imagine there are more registers like this and sometimes even more than 2 sources exist. To solve this problem I replaced everything with dual-port RAM but this on the other hand forced me to go through the algorithm again (writing takes longer now, which causes problems when the result of one operation is needed immediately after it’s calculated).
On the bright side, we decided to move some of the operations to the software. Hardware part will be responsible only for Finite Field Arithmetic. This seems reasonable as remaining operations are 2-3 multiplications and few inversions and even though they are slow in software it is negligible overall, but saves a lot of hardware resources. This way there is no need for implementing dedicated 163 bit modulo multiplier (which would be used only 2-3 times anyway).