Memory bug while locking texture [not leak]

tntnkn · May 27, 2023, 10:46am

Hello!
I have a weird behavior when trying to lock a texture with a purpose of setting individual pixels and then presenting the whole stuff, and this behavior manifests itself with three different errors occurring in three different places of the program and each time I try to run the program only one on them happens seemingly at random.

Context:
I am doing a simple Woolfenstien-like raycaster. I first used SDL_DrawPoint and everything worked well. Then I decided to check if writing directly to “screen” would be faster.

Code snippets:
First, I have a class with a set of methods that wrap calls to SDL.
Below is a part of initialization of resources:

        SDL_Window *window = SDL_CreateWindow(
                                  "Window name",
                                  SDL_WINDOWPOS_UNDEFINED,
                                  SDL_WINDOWPOS_UNDEFINED,
                                  SCREEN_WIDTH,
                                  SCREEN_HEIGHT,
                                  SDL_WINDOW_SHOWN);
        SDL_Renderer *renderer = SDL_CreateRenderer(
                                  window,
                                  -1,
                                  SDL_RENDERER_ACCELERATED);
        SDL_Texture *screen = SDL_CreateTexture(
                renderer, 
                SDL_PIXELFORMAT_RGBA8888, SDL_TEXTUREACCESS_STREAMING, 
                SCREEN_WIDTH, SCREEN_HEIGHT);

Drawing facilities are as follows:

    void update() { 
        SDL_RenderCopy(RENDERER, SCREEN, NULL, NULL);
        SDL_RenderPresent(RENDERER); 
    };

    void clear()  { 
        SDL_SetRenderDrawColor(RENDERER, 0x80, 0x80, 0x80, 0xFF); 
        SDL_RenderClear(RENDERER); 
    };

    void lock() {
        int pitch;
        SDL_LockTexture(SCREEN, NULL, (void**)&m_screen_pixels, &pitch);
        assert(pitch/4 == SCREEN_WIDTH); //this is early in the development =)
    }

    void unlock() {
        SDL_UnlockTexture(SCREEN);
        m_screen_pixels = NULL;
    }

    void setPixel(int x, int y, uint32_t color) {
        m_screen_pixels[SCREEN_WIDTH*y+x] = color; 
    }

m_screen_pointer is defined like this:

    uint32_t *m_screen_pixels {nullptr};

lock() is called in the begging of the drawing function (each frame), unlock() is called when this function is exited. setPixel() is called inside this function every time a pixel color should be set.

The errors:
I am getting one of the three following errors each time the program is run:

[1] 573049 segmentation fault (core dumped) - before call to SDL_RenderPresent();
munmap_chunk(): invalid pointer - before call to SDL_UnlockTexture();
malloc(): invalid size (unsorted) - before call to SDL_RenderCopy().

I would appreciate any advise on what is going on.
Thank you in advance and wish you a great weekend!

yataro · May 28, 2023, 4:23am

Although your snippet looks right, the more complete example is needed (minimal compilable code will be great).

It will be faster than drawing each pixel using SDL_DrawPoint, but the thing you trying to achieve in your code is still slow (you literally doing the software rendering in the hardware accelerated context), because texture locking might trigger data transfer between GPU and RAM each time you lock/unlock.

If you really want to use this approach check the If your game just wants to get fully-rendered frames to the screen section of the migration guide.

I think the real world solution is OpenGL rendering in this case.

tntnkn · May 28, 2023, 11:49am

Hello and thank you replying!

I tried your suggestion but weirdly enough the problem persisted. I will try to put together a brief compilable example reproducing the problem.

By the way, isn’t SDL_UpdateTexture said to be slow by the documentation? That was the premier reason I chose locking texture at first place, as it seems for me that there are no other options left for this purpose. It seems like updating texture saves on transferring data from GPU to RAM on locking, though.

I am mostly interested in learning SDL a bit more with this particular project, and, after all, if Carmack did it back to the good ol’ days without GPU, chances are I can too)

tntnkn · May 28, 2023, 3:09pm

I managed to fix the problem.
Trivia is that I wrote to the pixels array m_screen_pixels beyond it bounds. Classic.

yataro · May 29, 2023, 1:42am

I suggested SDL_UpdateTexture from the migration guide, because it’s more straightforward than locking and is fast enough for rendering in 60+ fps.
If you want to learn SDL, then you can try to rewrite your program to use SDL functions instead of all hard math you currently do in your code. You can use SDL_RenderGeometry and the SDL_RenderGeometryRaw functions to implement your raycaster.

Glad you solved your problem tho, good luck with your beginnings!

tntnkn · May 29, 2023, 12:04pm

Fairly, I haven’t noticed a significant performance difference between SDL_UpdateTexture and Lock/Unlock with my particular app (composing pixels and spitting them out on a screen once every frame).

Wow, I didn’t know SDL had arbitrary 2d geometry rendering capabilities, that’s neat, thank you for showing that! So basically I can move interpolation step for texture sampling into there and cast only to find coordinates of vertices?

I’m sorry if I’m bothering you to much with the questions unrelated to the topic, but these vertices in geometry rendering functions bare coordinates in window or logical space? Docs says about SDL_Renderer coordinates, so I presume that these two.

yataro · May 29, 2023, 1:58pm

Yes, you can do rasterization for your raycaster using SDL_RenderGeometry

The points you set in the vertices are the coordinates on your renderer.
Note that renderer size is not guaranteed to equal window size (for example it possible to have bigger context with high-DPI displays, so use SDL_GetRendererOutputSize to check it, or SDL_RenderSetLogicalSize to set a device independent resolution), but I think, ignoring this is absolutely safe.

No problem with your questions at all, feel free to ask anything SDL-related here, but I will suggest you to create a new topic if the previous one is highly unrelated to your new question.
Wish you happy rewriting !

furious-programming · May 30, 2023, 10:54am

If you want to, you can also divide your CPU rendering into few threads (SDL_GetCPUCount will be useful) and fill the pixels buffer, then upload the array to the texture in VRAM, and finally, render everything like UI, HUD etc. using GPU (SDL_Render* API). You can also optimize this by having a small back buffer for raycasting (like 320×240, if your game must be pixelated) and after it, just upscale it to the window size using SDL_RenderCopy.

But of course, you can also play with the SDL_RenderGeometry. You have many possibilities to do what you need.

tntnkn · May 30, 2023, 12:52pm

Hey, thanks for clarifications and for your time, I’m learning a lot from you! The only thing now is to apply this to the program)
Wish you a wonderful day too!

tntnkn · May 30, 2023, 1:08pm

Hey, thank you for stopping by and for good advice!

It seems like I still can process my vertices in parallel putting them into array for SDL_RenderGeometry, but that seems to be a wee bit of an overkill for this particular app.

The small back buffer is probably the thing I should take into account at first, because there is really no point in going through all the pixels of the screen if the textures are rather low-res.

furious-programming · May 31, 2023, 10:28am

AFAIK the SDL renderer API is not multithreaded, so you should render things only in the window thread, the same that processes events.

If you are doing a retro-style game (strongly pixelated), rendering every pixel of the window is a big waste of time in the case of software rendering. Having small back bufer is a strong optimization, but not only — it will allow you to simplify rendering calculations. Filling the pixel buffer with many threads will speed up rendering significantly.

I’m using this approach in my engine germ. The game will be retro-style, pixelated, 358×224 resolution, so the pixel array (in RAM) will be this size. During rendering the game frame, first the pixel buffer will be filled using as many threads as CPU cores (syncing will not be needed, just render row by row, one row per one thread). I prefere to use CPU, because I’m interested in raytracing, so it will be much easier to implement on the CPU side.

Then, pixel data will be transfered to the VRAM (updating the back buffer texture, also in size 358×224) and all other things will be rendered on this texture using GPU (SDL_Render* API). Then upscale to the window size, some postprocessing (like glowing, blur, CRT filter etc., mayby using shaders (in SDL3)), render the final texture in the window (keeping desired aspect ratio) and flip.

This should be highly efficient. If you’re interested in raycasting the world on the CPU side, I recommend doing it this way. The fewer pixels the CPU has to fill, the better for overall performance. However, sprites should be rendered using GPU, also for the performance sake.

tntnkn · May 31, 2023, 2:01pm

I guess so, but i still can populate an array needed for this in parallel, at least in theory.

BTW, won’t separating the rendering surface into chunks be faster? Say, if you have 8 rows and 4 cores, you may give each tread 2 consecutive rows so the memory accesses are denser.

Not sure that I’ll be able to use z buffer in that case for determining whether the sprites are half-hidden by a wall.

furious-programming · May 31, 2023, 10:35pm

That’s what I’m all about. If you choose to use the SDL_Render* API you will be limited to the window thread. However, if you use CPU side rendering, you’ll be able to do it in parallel.

Yes, but it would have negative consequences. If the buffer is divided into equal areas in the form of horizontal stripes (as many areas as CPU cores), then the greater the difference in the number of objects between these areas, the greater the difference in time to fill them between threads, and therefore the longer it will take to render the entire frame. To compensate for these differences, instead of filling the entire large area with a given thread, it’s better to give each thread one row of pixels to fill.

Imagine that you need to render a lot of objects at the top of the screen, and only a few at the bottom of the screen. The thread that will render the top area will run many times longer than the one that fills the area at the bottom of the frame. So one CPU core will definitely have more work to do than another. On the other hand, if you do it row-per-thread, all cores will perform a similar amount of calculations, which will minimize the demand on CPU time and thus shorten the frame rendering time.

I shouldn’t have used the word ”sprite”, because apparently it is misleading.

What I meant was that your raycaster should render the game world, while frame elements like HUD, menus, dialog balloons, etc., things that don’t require raycasting, should be rendered using textures and the SDL_Render* API. This will give you high performance and convenience.

So the pipeline would look like this:

fill an array of pixels using raycasting (if you can, in multiple threads),
update the texture of the back buffer by copying the contents of the array to it,
render the rest of the frame using textures,
do whatever you want with the backbuffer texture and finally render it in a window.

tntnkn · June 1, 2023, 12:11pm

But if you are good with affine mapping, then you can calculate only vertices in parallel and then do SDL_RenderGeometry. So the rendering itself would use the api while cpu load is higly reduced. Not sure if that is feasible in your situation.

Got it, thanks! So the only concern is to use thread pool carefully so no threads are recreated, but I guess that holds true for my approach also and is sort of implied.

Ah, yeah, makes it much easier to blend things nicely over the screen.

It seems like this thread (no pun intended) is going far away from its topic. We can start a new one or move into dm.