An Intel Programmer Jumps over Wall: First Impressions of ARM SIMD Programming

An Intel Programmer Jumps over Wall: First Impressions of ARM SIMD Programming

[ many of these missing byte operations will finally appear with SNC, assuming GFNI is fast enough; the catchily named GF2P8AFFINEQB will allow arbitrary bit permutes, thus including rotates and shifts – see Intel® Architecture Instruction Set Extensions and Future Features Programming Reference for details ]

Orthogonality of these operations is a huge relief to the programmer – it’s less work to commit things to memory, and fewer “what the hell” moments later when you realize something that should exist doesn’t. More on latencies later (I would like to praise these ARM SIMD operations more but, despite many happy claims about how fast these ops are in some architectures – e.g. A12 – I can’t find any documentation). This is something that exists spottily on Intel SIMD, but ARM allows a wide variety of operations to be either applied across the whole vector or be done as a pairwise approach.

Source: branchfree.org