kvs wrote:Actually the current Elbrus implementations all take a big hit from 64 bit floating point compared to 32 bit. This
has nothing to do with emulation. The Elbrus simply does not have 64 bit wide FPUs.
The 8 core variant has the following official numbers:
25 операций в такт в каждом ядре (8 цел., 12 веществ.)
250 GFLOPS одинарной точности, 125 GFLOPS двойной точности
There is a factor of two hit from using 64 bit (double precision) floating point.
And this is not an academic problem. I know from personal experience that single precision is crap for physical process
codes. If Russia wants an HPC processor, it needs to have 64 bit x 4-8 units for floating point. If the need is for some
military specific format, then have a separate model for it. Trying to satisfy contradictory constraints is a losing proposition.
Doesn't matter how you make the architecture the hit will be there.
This kind of penalty hit the same on any intel / AMD cpu.
Check the SSE2/3/AVE implementation, all register can accept 4 32bit float, or 2 64bit float.
So, if you switch between 32 and 64 bit then the penalty will be 50% of the performance.
Reason simple, the exponent and fraction addition/multiplication done in circuit, and the 32 and 64 bit done on the same transistor set.
Example , the addition implemented byte wise, and the 32/64/128 bit wise different only in the overflow target.
Means if they doing addtion over two 128 register, with 2*4 32bitfloat, each of them two 8 bit long and two 24 bit long number, then the sum circuit will push the lower two byte overflow to the above circuit, the last one will push it to the overflow CPU flag.
Means if they doing addtion over two 128 register, with 2*2 64bit float,then it will be 2*16 bit and 2*48 bit number, and two of the overflow will not go to the CPU flag, but to the low end bit of the above register.
Is it clear ?