Sw render Clip issue

Hi, I seem to have a problem that setting a clipping rectangle
on software renderer is corrupting something and causing a crash.

  1. My code works without the clipping.

  2. I get various crashes if I call SDL_SetClipRect,
    including a diagnostic detecting a corrupted checksum on free().
    The crash usually occurs inside my garbage collector.

This kind of crash when using my system with C like code is usually
due to a fault in the application C like code, although occasionally
a bug in the collector or library data types is found.

However what’s suspicious here is that I reliably get an crash
when I call SDL_SetClipRect, and not otherwise.

So the indication is that it is an issue in the software renderer.

  1. I removed all calls to DrawLine, it still crashes. Which is weird.
    The rest of my drawing is blitting and doesn’t involve the software
    rendering, its directly to the window surface.

I suspect this is the problem: I’m using the renderer AND blitting
directly to the surface. However I can’t see why that shouldn’t
work. I’m not using the rendering API on the window, I’m using
it on the surface of the window: the header says:

  • \note You may not combine this with 3D or the rendering API on this window.
    extern DECLSPEC SDL_Surface * SDLCALL SDL_GetWindowSurface(SDL_Window * window);

Of course I could use the renderer by converting the source surfaces I blit
to textures, but I don’t see why I should have to do that.

Any hints what could be going wrong?

I have looked briefly at the software rendering code and can’t see
any obvious issues.–
john skaller
@john_skaller
http://felix-lang.org

Hi, I seem to have a problem that setting a clipping rectangle
on software renderer is corrupting something and causing a crash.

Ok, I have been looking at the source and I’m not surprised now.
I haven’t located the problem but the code is suspicious.

Here’s the issue. When you set clipping rect, it is intersected
with the surface rect and the intersection is stored. A flag says
if the result has any area or not.

The invariant for SDL_Rect is that negative heights and widths
imply an empty rectangle.

However, the blit functions do NOT check this invariant! They only
check if the rect pointer is NULL.

So if you set a clip rect outside the window, all hell will break lose.
A loop from x to x + h will scan all of memory except for the h entries
before x, certainly causing a crash.

SDL_SetClipRect ends with:

return SDL_IntersectRect(rect, &full_rect, &surface->clip_rect);

but this stores the result into the cliprect EVEN IF it is an “error”.

Unless there’s a reason to have negative widths and heights,
SDL_IntersectRect should be fixed to put 0 in the width and height
if the rectangle is empty. This would mean most properly designed
loops would work, without needing any check.

The code for SDL_IntersectRect is really badly bugged.

First, it checks for NULL pointers and gives SDL_FALSE if one of the
input arguments is NULL. But it doesn’t set the result!

Then it checks if either argument is empty, if so it DOES set the
result.

Then it calculates the intersection and sets the result, including
negative numbers for the width and height.

This is very confused code. The invariant is hard to determine!

If you set a clip rectangle with a NULL pointer the whole target is
used, but if it is otherwise empty the clipping rectangle is set to
a value which means empty, but requires a test for the negative
width or height to establish this.

/* clip the destination rectangle against the clip rectangle */
{
    SDL_Rect *clip = &dst->clip_rect;
    int dx, dy;

    dx = clip->x - dstrect->x;
    if (dx > 0) {
        w -= dx;
        dstrect->x += dx;
        srcx += dx;
    }
    dx = dstrect->x + w - clip->x - clip->w;
    if (dx > 0)
        w -= dx;

    dy = clip->y - dstrect->y;
    if (dy > 0) {
        h -= dy;
        dstrect->y += dy;
        srcy += dy;
    }
    dy = dstrect->y + h - clip->y - clip->h;
    if (dy > 0)
        h -= dy;
}

As you can see, this code “just uses clip->w” which could be negative.
Perhaps it works, its too tricky for me to tell!

Later on it does this:

if (w > 0 && h > 0) {
    SDL_Rect sr;
    sr.x = srcx;
    sr.y = srcy;
    sr.w = dstrect->w = w;
    sr.h = dstrect->h = h;
    return SDL_LowerBlit(src, &sr, dst, dstrect);
}
dstrect->w = dstrect->h = 0;
return 0;

but it isn’t clear the adjusted w value will be non-positive when it needs
to be to suppress the lower (unchecked) blit operation.

I REALLY wish this code were C++ because it is so much better at
enforcing invariants.

BTW: My clip rect causing the crash is interior to the surface so this doesn’t
explain my crash.On 28/07/2013, at 12:48 PM, john skaller wrote:


john skaller
@john_skaller
http://felix-lang.org

Hi, I seem to have a problem that setting a clipping rectangle
on software renderer is corrupting something and causing a crash.

Ok, I have been looking at the source and I’m not surprised now.
I haven’t located the problem but the code is suspicious.

Valgrind finds this:

==290== Invalid write of size 4
==290== at 0x100924FF8: SDL_IntersectRect (in /usr/local/lib/libSDL2-2.0.0.dylib)
==290== by 0x100926BA0: SDL_SetClipRect (in /usr/local/lib/libSDL2-2.0.0.dylib)
==290== by 0x1008F426D: SW_UpdateClipRect (in /usr/local/lib/libSDL2-2.0.0.dylib)
==290== by 0x11A403579: flxusr::edit_display::draw::resume() (in /Users/johnskaller/felix/demos/sdl/edit_display.dylib)On 29/07/2013, at 1:40 AM, john skaller wrote:

On 28/07/2013, at 12:48 PM, john skaller wrote:


john skaller
@john_skaller
http://felix-lang.org

john skaller :

Hi, I seem to have a problem that setting a clipping rectangle
on software renderer is corrupting something and causing a crash.

[…]

Regarding that and your follow-ups on that one, can you please provide
a stripped down example that could cause the issue? I experienced
some memory corruptions in the past with the software clipping, but was
never able to reproduce them.
It’d help to create (or let the developers create) a fix, though.

Cheers
Marcus

john skaller :

Hi, I seem to have a problem that setting a clipping rectangle
on software renderer is corrupting something and causing a crash.

Ok, I have been looking at the source and I’m not surprised now.
I haven’t located the problem but the code is suspicious.

Here’s the issue. When you set clipping rect, it is intersected
with the surface rect and the intersection is stored. A flag says
if the result has any area or not.

The invariant for SDL_Rect is that negative heights and widths
imply an empty rectangle.

However, the blit functions do NOT check this invariant! They only
check if the rect pointer is NULL.

So if you set a clip rect outside the window, all hell will break lose.
A loop from x to x + h will scan all of memory except for the h entries
before x, certainly causing a crash.

SDL_SetClipRect ends with:

return SDL_IntersectRect(rect, &full_rect, &surface->clip_rect);

but this stores the result into the cliprect EVEN IF it is an “error”.

Unless there’s a reason to have negative widths and heights,
SDL_IntersectRect should be fixed to put 0 in the width and height
if the rectangle is empty. This would mean most properly designed
loops would work, without needing any check.

The code for SDL_IntersectRect is really badly bugged.

First, it checks for NULL pointers and gives SDL_FALSE if one of the
input arguments is NULL. But it doesn’t set the result!

Correct - and I would not want it otherwise. It does not MODIFY the
passed in result would be more appropriate to say.

Then it checks if either argument is empty, if so it DOES set the
result.

Yes, it modifies the result to be empty, which, in my opinion is
correct, too. Two empty bodies can not intersect and hence the
result must be empty (w and h = 0).

Then it calculates the intersection and sets the result, including
negative numbers for the width and height.

This is the real problem. The question would be: how can it happen
that the width or height become negative? Which (once more) makes me wonder,
why SDL_Rect uses a signed int for the width and height instead of unsigned.
This is another story, though.

This is very confused code. The invariant is hard to determine!

If you set a clip rectangle with a NULL pointer the whole target is
used, but if it is otherwise empty the clipping rectangle is set to
a value which means empty, but requires a test for the negative
width or height to establish this.

It works as documented - the only problem is the negative width and
height in my opinion.

Cheers
Marcus> On 28/07/2013, at 12:48 PM, john skaller wrote:

First, it checks for NULL pointers and gives SDL_FALSE if one of the
input arguments is NULL. But it doesn’t set the result!

Correct - and I would not want it otherwise. It does not MODIFY the
passed in result would be more appropriate to say.

Yes, but it isn’t clear if this is consistent.

Exactly what does this code mean??

if (!A) {
    SDL_InvalidParamError("A");
    return SDL_FALSE;
}

If it’s an error, it should abort the program (in C++ throw
an exception). So how can it return SDL_FALSE?

Then it checks if either argument is empty, if so it DOES set the
result.

Yes, it modifies the result to be empty, which, in my opinion is
correct, too. Two empty bodies can not intersect and hence the
result must be empty (w and h = 0).

Then it calculates the intersection and sets the result, including
negative numbers for the width and height.

This is the real problem. The question would be: how can it happen
that the width or height become negative?

A more fundamental question is what does it mean to have
a negative height or width. There is a perfectly acceptable answer
to this: the point is on the right/bottom and excluded, and that’s
the proper mathematical interpretation. But it does NOT agree with
code that checks for empty rectangle. However it does agree with
the code that calculates the intersection (I think anyhow, the code
is hard to follow … I had to run test cases in a copy of the code
written in Python to figure it out :slight_smile:

So the code is inconsistent. Because it’s C and you cannot write
this code without at least making the required invariants
clear in comments.

Which (once more) makes me wonder,
why SDL_Rect uses a signed int for the width and height instead of unsigned.
This is another story, though.

There’s a good reason for that. Using an unsigned value in a woefully
badly typed language like C would just lead to a silent conversion,
say of -1 to 65534 if int happened to be 16 bit, and that would be
a disaster and not detectable.

It’s better to use int, and then check the signed value is non-negative.
Generally speaking unsigned integers are a hack and have no place
at all in application programming: they’re ONLY useful in kernels
and very low level software that does things like scan all memory.
For the kind of reason above. Even if modular arithmetic is useful,
modulo a power of 2 is one of the most useless bases (modulo
a prime is more useful because the structure is a finite field).

But as you say, another story.

This is very confused code. The invariant is hard to determine!

If you set a clip rectangle with a NULL pointer the whole target is
used, but if it is otherwise empty the clipping rectangle is set to
a value which means empty, but requires a test for the negative
width or height to establish this.

It works as documented - the only problem is the negative width and
height in my opinion.

As documented where???On 30/07/2013, at 9:35 PM, mva at sysfault.org wrote:


john skaller
@john_skaller
http://felix-lang.org

john skaller :

First, it checks for NULL pointers and gives SDL_FALSE if one of the
input arguments is NULL. But it doesn’t set the result!

Correct - and I would not want it otherwise. It does not MODIFY the
passed in result would be more appropriate to say.

Yes, but it isn’t clear if this is consistent.

Exactly what does this code mean??

if (!A) {
SDL_InvalidParamError(“A”);
return SDL_FALSE;
}

If it’s an error, it should abort the program (in C++ throw
an exception). So how can it return SDL_FALSE?

It sets an error and returns SDL_FALSE, no need to abort anything.
The doc comment should be more clear about that, true.

[…]

So the code is inconsistent. Because it’s C and you cannot write
this code without at least making the required invariants
clear in comments.

The comments could be more clear, the “because it’s C” is nonsense.

Which (once more) makes me wonder,
why SDL_Rect uses a signed int for the width and height instead of unsigned.
This is another story, though.

There’s a good reason for that. Using an unsigned value in a woefully
badly typed language like C would just lead to a silent conversion,
say of -1 to 65534 if int happened to be 16 bit, and that would be
a disaster and not detectable.

Blame your compiler flags for that, not the language. maybe blame the
developer for not properly dealing with any value.

This is very confused code. The invariant is hard to determine!

If you set a clip rectangle with a NULL pointer the whole target is
used, but if it is otherwise empty the clipping rectangle is set to
a value which means empty, but requires a test for the negative
width or height to establish this.

It works as documented - the only problem is the negative width and
height in my opinion.

As documented where???

as documented SDL_SetClipRect doc comment (SDL_Surface.h):

/**

  • Sets the clipping rectangle for the destination surface in a blit.> On 30/07/2013, at 9:35 PM, @Marcus_von_Appen wrote:
  • If the clip rectangle is NULL, clipping will be disabled.
  • If the clip rectangle doesn’t intersect the surface, the function will
  • return SDL_FALSE and blits will be completely clipped. Otherwise the
  • function returns SDL_TRUE and blits to the surface will be clipped to
  • the intersection of the surface area and the clipping rectangle.
  • Note that blits are automatically clipped to the edges of the source
  • and destination surfaces.
    */

Cheers
Marcus

john skaller <@john_skaller>:

Hi, I seem to have a problem that setting a clipping rectangle
on software renderer is corrupting something and causing a crash.

[…]

Regarding that and your follow-ups on that one, can you please provide
a stripped down example that could cause the issue?

I don’t know if I can or not. My code is written in Felix, which generates
C++ corresponding to ordinary C use of SDL. However the program
consists of a standard mainline, four shared libraries loaded under program
control, and about 10 other shared libraries autoloaded by the linker,
(including SDL and freetype) and includes a garbage collector,
asynchronous I/O support and other stuff.

The crash invariably occurs during garbage collection, because that’s
a time when a lot of memory is scanned and unused memory freed.
The SDL data structures involved are NOT garbage collected,
but if an SDL function corrupted some Felix GC memory the
GC’s operation (including free()ing garbage) could trigger
an access violation.

So if I just write a simple cut down C program that does a single
blit to a surface and set a clipping rectangle it probably won’t
crash :slight_smile: If it would crash the problem would have been found
by other ages ago.

I experienced
some memory corruptions in the past with the software clipping, but was
never able to reproduce them.
It’d help to create (or let the developers create) a fix, though.

I have looked at the code, and I have to repeat that in my actual
use case there is NO issue of non-intersection, even though I am
suspicious of that code. My clipping region is strictly interior
to the surface.

Tracing through the actual sequence of calls indicated in another post
as leading to an access violation, as shown by Valgrind, I cannot see anything
at all wrong in the SDL C source code. If that was my only data I’d say that I have
screwed up one of the pointers eg to the renderer or window surface.

The problem is that if I remove the setting of the clipping region everything
actually works, and because of my very lame inefficient code that
includes MANY calls to the garbage collector.

Heck, the bug could be in my C compiler (I’m using Clang 3.3 from SVN,
but SDL is built with Apple’s gcc 4.2).

[Hmm … actually MY code uses Clang’s C/C++ libraries but SDL is built
against GNU’s C libraries … interesting … arrggh … :]

I had a problem in Felix itself before, because Felix uses a lot of "reinterpetations"
of store which breaks strict aliasing rules.

Given the nature of SDL its very likely SDL breaks the C Standard too,
but the build script neglects to turn strict aliasing optimisations off.
This is mandatory in Clang and gcc 4.2 with even -O1. These compilers
really do perform type based optimisations that assume your code does
not do any type punning. Type punning is not allowed in ISO C.

So the bottom line here is: I don’t think I can provide any cut down code
demonstrating the fault until I have a better understanding what the problem
actually is. And in that case I can probably demonstrate the problem by
reference to the source and fix it. Since my own code is quite complex,
and is written in another language which is translated to C++, making
the generated code also hard to follow, I cannot rule out an error in my
own code.

I’m just suspicious because SDL_RenderSetClipRect is the
ONLY function that causes a problem. Setting the scale doesn’t.
Removing the clipping fixes the problem.

Valgrind says this:

==327== Invalid read of size 4
==327== at 0x100924FD9: SDL_IntersectRect (in /usr/local/lib/libSDL2-2.0.0.dylib)
==327== by 0x100926BA0: SDL_SetClipRect (in /usr/local/lib/libSDL2-2.0.0.dylib)
==327== by 0x1008F426D: SW_UpdateClipRect (in /usr/local/lib/libSDL2-2.0.0.dylib)
==327== by 0x11F24CFB9: flxusr::edit_display::draw::resume() (in /Users/johnskaller/felix/demos/sdl/edit_display.dylib)

and I’ve seen an invalid write too.

The thing is, valigrind isn’t reliable (it reports these things in code I know is correct).

However under valgrind my code does NOT crash.

Under gdb it crashes in many different ways, always in the garbage collector,
but for many different reasons (indicating corruption).On 30/07/2013, at 8:53 PM, mva at sysfault.org wrote:


john skaller
@john_skaller
http://felix-lang.org

Exactly what does this code mean??

if (!A) {
SDL_InvalidParamError(“A”);
return SDL_FALSE;
}

If it’s an error, it should abort the program (in C++ throw
an exception). So how can it return SDL_FALSE?

It sets an error and returns SDL_FALSE, no need to abort anything.
The doc comment should be more clear about that, true.

There is clearly some need, otherwise SDL_InvalidParamError (“A”)
would not be there.

So the code is inconsistent. Because it’s C and you cannot write
this code without at least making the required invariants
clear in comments.

The comments could be more clear, the “because it’s C” is nonsense.

Because it is C there is no way to express the invariants
other than in comments. In C++ you can use a class with private
members and enforce the invariant with code in the constructor.

Expressing the invariants is necessary for understanding
the code (doing so is part of the programmers job).

Hence my comment is not nonsense at all, in fact it is
logically derivable (i.e. provable) from basic assumptions.

Which (once more) makes me wonder,
why SDL_Rect uses a signed int for the width and height instead of unsigned.
This is another story, though.

There’s a good reason for that. Using an unsigned value in a woefully
badly typed language like C would just lead to a silent conversion,
say of -1 to 65534 if int happened to be 16 bit, and that would be
a disaster and not detectable.

Blame your compiler flags for that, not the language. maybe blame the
developer for not properly dealing with any value.

I blame the language too.

As documented where???

as documented SDL_SetClipRect doc comment (SDL_Surface.h):

/**

  • Sets the clipping rectangle for the destination surface in a blit.
  • If the clip rectangle is NULL, clipping will be disabled.
  • If the clip rectangle doesn’t intersect the surface, the function will
  • return SDL_FALSE and blits will be completely clipped. Otherwise the
  • function returns SDL_TRUE and blits to the surface will be clipped to
  • the intersection of the surface area and the clipping rectangle.
  • Note that blits are automatically clipped to the edges of the source
  • and destination surfaces.
    */

These comments are not complete documentation because they do not
tell what happens if the input SDL_Rect contains negative width
or height.

A statement about the meaning of that condition actually belongs
in SDL_rect.h IMHO. Are they legal values, just representing empty
rectangles?

As mentioned before there’s a reason to ask because the code
that actually uses the clip rect DOES NOT CHECK and will crash
if this is the case but SDL_RenderSetClipRect creates precisely
such a value as the cliprect when attached to a software renderer
because it calls SDL_IntersectRect.

In particular in video/SDL_blit.c for example, routine
SDL_SoftBlit we have this:

if (okay && srcrect->w && srcrect->h) {


info->dst_w = dstrect->w;
info->dst_h = dstrect->h;

so this code checks for 0 width and height but not negative.

And somewhere in SDL_blit_copy.c we have:

while (h--) {
    SDL_memcpy(dst, src, w);
    src += srcskip;
    dst += dstskip;
}

which CLEARLY fails if h or w is negative.

There are enough layers of code here I cannot prove an error path
without a bit more analysis, however as i mentioned I don’t think this
particular problem is causing my crash.On 30/07/2013, at 11:31 PM, mva at sysfault.org wrote:


john skaller
@john_skaller
http://felix-lang.org

Valgrind finds this:

==290== Invalid write of size 4
==290== at 0x100924FF8: SDL_IntersectRect (in /usr/local/lib/libSDL2-2.0.0.dylib)
==290== by 0x100926BA0: SDL_SetClipRect (in /usr/local/lib/libSDL2-2.0.0.dylib)
==290== by 0x1008F426D: SW_UpdateClipRect (in /usr/local/lib/libSDL2-2.0.0.dylib)
==290== by 0x11A403579: flxusr::edit_display::draw::resume() (in /Users/johnskaller/felix/demos/sdl/edit_display.dylib)

The bug is in SDL_render_sw.c, line 350:

static int
SW_UpdateClipRect(SDL_Renderer * renderer)
{
const SDL_Rect rect = &renderer->clip_rect;
SDL_Surface
framebuffer = (SDL_Surface *) renderer->driverdata; //<<— BUG

Thats WRONG. The driverdata of a renderer does NOT point at an SDL_Surface.
It points at a SW_RenderData. This code, only a a few lines above, is correct:

static int
SW_UpdateViewport(SDL_Renderer * renderer)
{
SW_RenderData *data = (SW_RenderData *) renderer->driverdata;
SDL_Surface *surface = data->surface;

Another case of lousy C type system. The reason for this mistake is also quite
clear: this code is correct:

    data->surface = (SDL_Surface *) texture->driverdata;

i.e. the driver data for a TEXTURE is just a surface. Whoever typed that in so many
times typed it in again for a renderer by mistake.

[Reminds me of the old joke: all problems in computer science can be
solved by adding one more level of indirection]On 30/07/2013, at 7:30 PM, john skaller wrote:


john skaller
@john_skaller
http://felix-lang.org

I think you may have a conflict in your source tree or it is outdated, it
seems this bug is already fixed:

Line 350 differs from what you posted:

http://hg.libsdl.org/SDL/annotate/66f9d44f9717/src/render/software/SDL_render_sw.c#l350

2013/7/30 john skaller >

On 30/07/2013, at 7:30 PM, john skaller wrote:

Valgrind finds this:

==290== Invalid write of size 4
==290== at 0x100924FF8: SDL_IntersectRect (in
/usr/local/lib/libSDL2-2.0.0.dylib)
==290== by 0x100926BA0: SDL_SetClipRect (in
/usr/local/lib/libSDL2-2.0.0.dylib)
==290== by 0x1008F426D: SW_UpdateClipRect (in
/usr/local/lib/libSDL2-2.0.0.dylib)
==290== by 0x11A403579: flxusr::edit_display::draw::resume() (in
/Users/johnskaller/felix/demos/sdl/edit_display.dylib)

The bug is in SDL_render_sw.c, line 350:

static int
SW_UpdateClipRect(SDL_Renderer * renderer)
{
const SDL_Rect rect = &renderer->clip_rect;
SDL_Surface
framebuffer = (SDL_Surface *) renderer->driverdata;
//<<— BUG

Thats WRONG. The driverdata of a renderer does NOT point at an SDL_Surface.
It points at a SW_RenderData. This code, only a a few lines above, is
correct:

static int
SW_UpdateViewport(SDL_Renderer * renderer)
{
SW_RenderData *data = (SW_RenderData *) renderer->driverdata;
SDL_Surface *surface = data->surface;

Another case of lousy C type system. The reason for this mistake is also
quite
clear: this code is correct:

    data->surface = (SDL_Surface *) texture->driverdata;

i.e. the driver data for a TEXTURE is just a surface. Whoever typed that
in so many
times typed it in again for a renderer by mistake.

[Reminds me of the old joke: all problems in computer science can be
solved by adding one more level of indirection]


john skaller
skaller at users.sourceforge.net
http://felix-lang.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


Gabriel.

2013/7/30, john skaller :

There’s a good reason for that. Using an unsigned value in a woefully
badly typed language like C would just lead to a silent conversion,
say of -1 to 65534 if int happened to be 16 bit, and that would be
a disaster and not detectable.

65535, and that assumes two’s complement and nothing getting in the
way, the specification states that it’s undefined behavior and that
means that if the compiler feels like it that it can add an overflow
check and then crash the program or delete root or set the monitor on
fire or marry your wife or whatever it wants and it’d still be valid.

(in practice compilers will leave overflow in, but they will remove
any checks for the overflow that rely on the undefined behavior,
effectively meaning you have to be extremely careful about how you
check for overflow)

(also the above is for signed integers - unsigned integers have well
defined behavior and it’s wraparound, so at least with those it’s
easier to handle)

2013/7/30, john skaller :

I blame the language too.

Given the specification explicitly states it’s undefined behavior,
it’s the language’s fault actually >_>

I think you may have a conflict in your source tree or it is outdated, it seems this bug is already fixed:

http://hg.libsdl.org/SDL/rev/5b94da2650a6

Line 350 differs from what you posted:

http://hg.libsdl.org/SDL/annotate/66f9d44f9717/src/render/software/SDL_render_sw.c#l350

Yes indeed, this is what happens when you have TWO copies of SDL, one first built
from the tarball, and then another build from Mercurial.

So I have been looking at the source for a different version that the installed
binary, however it seems that even my Mercurial repo was out of date
(I can’t tell now because I updated it :slight_smile:

Lets see if a rebuild fixes it … yes, that fixes it :slight_smile:

Whew. Thanks everyone who read all this for patience.On 31/07/2013, at 9:23 AM, Gabriel Jacobo wrote:


john skaller
@john_skaller
http://felix-lang.org