I have a number of optimizations I’ve written for the blitters (most significantly BlitRGBtoRGBPixelAlpha) for Arm CPUs that I’d like to submit. Much of the improvement comes from the fact that they perform calculations on multiple pixels in parallel. However in order to achieve this, I found it was necessary to change the rounding method used during the calculation for semi-transparent pixels so that opaque pixels can be treated as a special case of semi-transparent ones. The change you’d typically see to the result of blitting a semi-transparent pixel is that any given colour component value will sometimes be one LSB different from what it used to be before.
So my question is, would my optimizations be acceptable even though they’re not bit-exact the same as the existing code? Would it help if I also changed the standard C version of the code to match my optimizations (even though this would then mean that the results of both of them differed from previous versions of SDL)?