what is the need of subword parallelism
Answers
Answered by
0
The idea is that if you have registers which can hold machine words of multiple times your data type size, you can pack several data elements into them, and make single instructions affect all of those simultaneously. A 128-bit register, for instance, can hold two 64-bit floating point values; as long as your 'multiply' instruction is aware that the register is split in the middle, you can get 2 multiplications out of 1 operation.
That's the essence of it, but everything's better with examples, so let's take our tiny computation to SWAR with an x86 processor, after a few short disclaimers:Vector support is CPU-specific, so this will only work on CPUs with the right feature set. I'll use SSE2 registers because they're common. If you have an x86 newer than the Pentium 4, they should be there, but we would have to write something else entirely for ARM, Power, or whatever.I write in C, but C doesn't like programmers to specify register-level operations, so this would have required inline assembly. Luckily, some compilers (like the gcc we'll be using) support language-extending intrinsics that do practically the same thing, but with nicer notation.This is for illustration. Try to make your compiler do the SIMD tricks for you before you resort to register-fiddling, your code will be more portable.
With all that out of the way, here is a program that
sets up an array of 2 doubles containing the values of Pi and e,loads them into a register that can hold them both,multiplies each number with itself, andwrites the result back to memory, so we can see that we've squared both values independently.
That's the essence of it, but everything's better with examples, so let's take our tiny computation to SWAR with an x86 processor, after a few short disclaimers:Vector support is CPU-specific, so this will only work on CPUs with the right feature set. I'll use SSE2 registers because they're common. If you have an x86 newer than the Pentium 4, they should be there, but we would have to write something else entirely for ARM, Power, or whatever.I write in C, but C doesn't like programmers to specify register-level operations, so this would have required inline assembly. Luckily, some compilers (like the gcc we'll be using) support language-extending intrinsics that do practically the same thing, but with nicer notation.This is for illustration. Try to make your compiler do the SIMD tricks for you before you resort to register-fiddling, your code will be more portable.
With all that out of the way, here is a program that
sets up an array of 2 doubles containing the values of Pi and e,loads them into a register that can hold them both,multiplies each number with itself, andwrites the result back to memory, so we can see that we've squared both values independently.
Similar questions
Science,
8 months ago
Art,
8 months ago
Math,
8 months ago
Math,
1 year ago
India Languages,
1 year ago