Pi 4 SIMD issues

A forum for ARM assembly programmers (32 & 64 bit)
Post Reply
rhyde
Site Admin
Posts: 51
Joined: Sun Dec 04, 2022 5:36 pm

Pi 4 SIMD issues

Post by rhyde »

The Cortex-A72 CPU used in the Raspberry Pi 4 seems to have some issues accessing the single-precision registers (S0-S31). An instruction such as

Code: Select all

vmov s0, r0
runs very slow. Storing R0 to an 8-byte memory location (with HO bytes zero) and loading D0 from that memory location is much faster (though it does wipe out S1). For example, a numeric-to-hexadecimal string conversion function I wrote ran almost three times faster by not using the single-precision registers. Note that this issue does not seem to happen on the Raspberry Pi 3's Cortex-A53 CPU.

Cheers,
Randy Hyde
Post Reply