Color change optimization

Hello everybody. I have the following method:

SDL_Texture* SDLGraphicsImpl::AddColorsToTheTexture(SDL_Renderer* renderer, SDL_Texture* texture, DrawTrianglesCmd* cmd)
{
    int w, h;
    SDL_QueryTexture(texture, nullptr, nullptr, &w, &h);

    std::vector<Uint32> pixelBuffer(w * h);
    Uint32* pixels = nullptr;

    int pitch = 0;
    if (SDL_LockTexture(texture, nullptr, reinterpret_cast<void**>(&pixels), &pitch) != 0)
    {
        ENGINE_LOG_WARNING("Failed to lock the texture for pixel modifications: %s", SDL_GetError());
        return texture;
    }

    SDL_PixelFormat* pixelFormat = SDL_AllocFormat(texture->format);
    Uint32 formatMask = pixelFormat->Rmask | pixelFormat->Gmask | pixelFormat->Bmask;

    const Uint32 rShift = pixelFormat->Rshift;
    const Uint32 gShift = pixelFormat->Gshift;
    const Uint32 bShift = pixelFormat->Bshift;

#pragma omp parallel for
    for (int i = 0; i < w * h; i++)
    {
        Uint32 pixel = pixels[i];
        const Uint8 r = (pixel & pixelFormat->Rmask) >> rShift;
        const Uint8 g = (pixel & pixelFormat->Gmask) >> gShift;
        const Uint8 b = (pixel & pixelFormat->Bmask) >> bShift;

        const Uint8 newR = (r + cmd->addition_color.Red > 255) ? 255 : r + cmd->addition_color.Red;
        const Uint8 newG = (g + cmd->addition_color.Green > 255) ? 255 : g + cmd->addition_color.Green;
        const Uint8 newB = (b + cmd->addition_color.Blue > 255) ? 255 : b + cmd->addition_color.Blue;

        pixelBuffer[i] = (newR << pixelFormat->Rshift) |
                          (newG << pixelFormat->Gshift) |
                          (newB << pixelFormat->Bshift) |
                          (pixel & ~formatMask);
    }

    SDL_FreeFormat(pixelFormat);
    SDL_UnlockTexture(texture);

    SDL_UpdateTexture(texture, nullptr, pixelBuffer.data(), w * sizeof(Uint32));

    return texture;
}

fps drops very much, how can it be fixed?

It’s unnecessary to use both SDL_LockTexture/SDL_UnlockTexture and SDL_UpdateTexture.

Note that SDL_LockTexture gives you write-only access to the pixels. The texture will be automatically updated when you call SDL_UnlockTexture.

SDL2/SDL_LockTexture - SDL Wiki

Lock a portion of the texture for write-only pixel access.

As an optimization, the pixels made available for editing don’t necessarily contain the old texture data. This is a write-only operation, and if you need to keep a copy of the texture data you should do that at the application level.

SDL2/SDL_UnlockTexture - SDL Wiki

Unlock a texture, uploading the changes to video memory, if needed.

2 Likes

it didn’t give me any fps gain(
fps drops due to this code:

for (int i = 0; i < w * h; i++)
    {
        Uint32 pixel = pixels[i];
        const Uint8 r = (pixel & pixelFormat->Rmask) >> rShift;
        const Uint8 g = (pixel & pixelFormat->Gmask) >> gShift;
        const Uint8 b = (pixel & pixelFormat->Bmask) >> bShift;

        const Uint8 newR = (r + cmd->addition_color.Red > 255) ? 255 : r + cmd->addition_color.Red;
        const Uint8 newG = (g + cmd->addition_color.Green > 255) ? 255 : g + cmd->addition_color.Green;
        const Uint8 newB = (b + cmd->addition_color.Blue > 255) ? 255 : b + cmd->addition_color.Blue;

        pixelBuffer[i] = (newR << pixelFormat->Rshift) |
                          (newG << pixelFormat->Gshift) |
                          (newB << pixelFormat->Bshift) |
                          (pixel & ~formatMask);
    }

maybe there are options how to get around it?

You are adding an RGB color to each pixel and saturating at 255? Therefore why not simply SDL_RenderCopy() a 1x1 source texture containing the wanted color, using SDL_BLENDMODE_ADD blend mode?

Your destination texture would need the SDL_TEXTUREACCESS_TARGET attribute, but so long as this doesn’t conflict with other operations you want to perform it should be super-fast.

3 Likes

If you really want to do it in software, you can use SIMD instructions to do the saturated addition. If you use compiler intrinsics you won’t even have to resort to assembly. x86 CPUs with SSE2 (basically all of them in use today; SSE2 was first introduced in 2000) have the PADDUSB instruction, which takes two SIMD registers and treats them as having a bunch of 8-bit unsigned values to be added together with saturation. Aside from getting a speed-up from eliminating branching, it also gives a performance increase because it can add all the color channels for the pixel at once instead of having to do them separately, and on top of that you can even do multiple pixels at once. ARM’s NEON SIMD instruction set has something similar AFAIK.

Basically, the steps are:

  • #include whatever header is the x86 compiler intrinsics header for your compiler
  • Load your pixel into one of the XMM registers, using a compiler intrinsic function from that header
  • Load your color to add to it into another XMM register
  • Call the compiler intrinsic function that translates to PADDUSB
  • Copy the new pixel value out of the destination XMM register

I used the inverse of this (PSUBUSB, unsigned saturated subtraction) to do distance fading in a software raycaster once.

edit: I should point out that this is the crazy person way of doing it, but boy is it fast.

1 Like

If you just want to add pixels, you can do a little trick - copy the existent texture to the new texture target, then overlay it with another texture filled with the target color. You also must set the ADD blendmode, but if you want to crop the color, when the original texture alpha is 0, you`ll have to compose the custom one.
The implementation will look somewhat like this:

auto blend_mode = SDL_ComposeCustomBlendMode(
        SDL_BLENDFACTOR_DST_ALPHA,
        SDL_BLENDFACTOR_ONE,
        SDL_BLENDOPERATION_ADD,
        SDL_BLENDFACTOR_ZERO,
        SDL_BLENDFACTOR_ONE,
        SDL_BLENDOPERATION_ADD);
...
SDL_Texture* SDLGraphicsImpl::AddColorsToTheTexture(SDL_Renderer* renderer, SDL_Texture* texture, const DrawTrianglesCmd* cmd)
{
    int width, height;
    SDL_QueryTexture(texture, nullptr, nullptr, &width, &height);
    SDL_Texture* newTexture = SDL_CreateTexture(renderer, SDL_PIXELFORMAT_RGBA8888, SDL_TEXTUREACCESS_TARGET, width, height);
    SDL_SetRenderTarget(renderer, newTexture);

// create a 1x1 surface, fill it with color, and create a texture from it
    SDL_Surface* colorSurface = SDL_CreateRGBSurface(0, 1, 1, 32, 0, 0, 0, 0);
    SDL_FillRect(colorSurface, nullptr, SDL_MapRGBA(colorSurface->format, cmd->addition_color.Red, cmd->addition_color.Green, cmd->addition_color.Blue, 255));
    SDL_Texture* colorTexture = SDL_CreateTextureFromSurface(renderer, colorSurface);
    SDL_FreeSurface(colorSurface);

// set composed blend mode
    SDL_SetTextureBlendMode(blend_mode);

    SDL_RenderCopy(renderer, texture, nullptr, nullptr);
    SDL_RenderCopy(renderer, colorTexture, nullptr, nullptr);

    SDL_SetRenderTarget(renderer, nullptr);

    SDL_DestroyTexture(colorTexture);

    SDL_SetTextureBlendMode(newTexture, texture->blendMode);
    return newTexture;
}