While this may not be implemented in the EDK63 version of the AEMB, it is an idea that has been on my mind for quite a while. Software floating-point emulation is expensive in terms of time while hardware FPU is expensive in terms of resources. So, a hybrid method might be useful. A single-precision float may be assisted in hardware by adding some extra functionality.

While a full blown FPU will take up too much resources, a simple device that helps decode floats into sign, exponent and mantissa into registers, could be used. This can be used to assist the software emulation routines along with a device that reverses the process by encoding the results into a IEEE 754 single-precision value. However, this idea needs further investigation to justify the trade-offs involved.

This device could be implemented using a stack model attached to the accelerator interface. This would also allow the data encoding/decoding process to be pipelined. In fact, if any FPU is constructed in the future, it may be easier to integrate it as a stack based co-processor than as a functional unit in the processor.