Re: 4 hashes parallel on SSE2 CPUs for 0.3.6

Participants: tcatm

Quote from: satoshi on July 31, 2010, 12:29:20 AM

That’s amazing…

So are you saying you use 128-bit registers to SIMD four 32-bit data at once? I’ve wondered about that for a long time, but I didn’t think it would be possible due to addition carrying into the neighbour’s value.

That’s how it works. Four 32 bit values in a 128 bit vector. They’re calculated independently, but at the same time.

Btw. Why are you using this alignup<16> function when attribute ((aligned (16))) will tell the compiler to align at compiletime?