Android app stability

My app, built with SDL2 (I can’t use SDL3), is generally very stable on all platforms apart from Android. Despite taking - as far as I know - all the additional measures required for Android (for example handling the SDL_APP_* and SDL_RENDER_* messages) I still occasionally see crashes, incorrectly scaled/cropped output and failure to recover from switching to the background and back.

Is there anything else I can try that might improve stability in Android? My app is already structured to be able to run in Emscripten / WebAssembly (for example it supports the emscripten_set_main_loop() callback) so could that capability beneficially be leveraged in Android too?

What sort of “User-perceived crash rate” do you get? My game’s on about 0.5%, mostly SIGSEGV and SIGABRT in [split_config.arm64_v8a.apk] so no real indication to me what’s happened.

Unfortunately I’m getting “Not enough data for the selected configuration” for that metric (even if I select the longest period of 3 years).

One specific thing I’ve noticed is that sometimes my app opens with its output cropped, as if SDL2 is reporting the screen size in the SDL_WINDOWEVENT_RESIZED message as something other than it really is.

If I then rotate the device from portrait to landscape it is correctly displayed, but when rotating back to portrait it is wrong again. So whatever the cause it’s ‘sticky’ and only closing the app and reopening it fixes it.

Looking at the detailed crash statistics I’m seeing the same, with the most common crash being:

[split_config.arm64_v8a.apk!libSDL2.so] GLES_RunCommandQueue SIGSEGV

I am using the GLES 1.1 backend (not GLES 2.0 or later) - I have to because I need the glLogicOp() function which is only available in GLES 1 - so whether that’s an aggravating factor I don’t know.

Ah, you know what, I started getting exactly that when I started playing around with event watching from your reply to my question about my Android app not triggering on messages like SDL_APP_WILLENTERBACKGROUND

I found when I added SDL_SetEventFilter(HandleAppEvents, NULL); I got exactly the problem you described.

My only thought was maybe the back/circle/square buttons are still drawn on the screen when the screen size is calculated, so that area gets chopped off the dimensions.

My understanding is that you have to handle SDL_APP_WILLENTERBACKGROUND in an Event Filter because by the time it gets passed to your normal SDL_PollEvent() handler it may be too late. So what I do is to set a global flag in my Event Filter and block the main loop when set.

Of course there’s no way that I can guarantee that I won’t call SDL_RenderPresent() after that flag has been set, because my code might have progressed beyond the point in the main loop where it is tested by then.

Is there a better way of doing this?

Does seem odd to me that SDL doesn’t seem to have an easy way to trap when the user hits the square button so we can save the current game state. I’ve got round it for now by my game auto-saving itself every 30 seconds, but that’s not ideal.

Have you tried removing the event watching to see if that fixes the strange screen size problem? If that fixes it for you too I guess there’s an issue in SDL?

If I remove the event watching, where shall I test for the SDL_APP_WILLENTERBACKGROUND message (if indeed it’s true that doing so in the SDL_PollEvent() handler may be too late)? I don’t want to replace one problem with potentially a more serious one.

There’s a rather concerning comment at Stack Overflow which implies that the Event Watch handler is called only in SDL_PollEvent(), contradicting the description that it is “triggered when an event is added to the event queue”. I hope that’s not accurate.

Hey,

Event Watch handler is called in SDL_PushEvent() (see SDL_events.c code )

{will, Did}_Enter_{Back, Fore}ground_event are not pushed anymore as event, and expect a Watch Event (see SDL_SendAppEvent in SDL_events.c)

Some internal:

On Android, SDL works with 2 threads (and also others, but we don’t care here):

  • android SDLActivity thread: the java side, that gets all system and environment information.
  • main C thread: you app and 99% of SDL library code.
    (others thread are SDL audio, video capture, and all android system drivers thread).

So SDLActivity is a listener of all system events and has to send them to the main C thread and SDL lib code. SDLActivity has some SDL code to that: for instance it can use SDL_PushEvent() to send an event (like FINGER_MOTION or SDL_APP_WILLENTERBACKGROUND) that you will get with SDL_PollEvent().

that’s fine because SDL_PushEvent() is thread safe with SDL_PollEvent();

but now we add Event Watch:

so when a SDL_PushEvent() is called, an event watch callback might be triggered.

there are not so many EventWatch in SDL internal. (max 5 in SDL3, depends on configuration).

SDL_RendererEventWatch is maybe the oldest one, in SDL3 it depends on:

922 if (event->type == SDL_EVENT_WINDOW_RESIZED ||
923 event->type == SDL_EVENT_WINDOW_PIXEL_SIZE_CHANGED ||
924 …
929 } else if (event->type == SDL_EVENT_WINDOW_HIDDEN) {

931 } else if (event->type == SDL_EVENT_WINDOW_SHOWN) {
932 …
935 } else if (event->type == SDL_EVENT_WINDOW_MINIMIZED) {

937 } else if (event->type == SDL_EVENT_WINDOW_RESTORED ||
938 event->type == SDL_EVENT_WINDOW_MAXIMIZED) {
939 …
942 } else if (event->type == SDL_EVENT_WINDOW_DISPLAY_CHANGED ||
943 event->type == SDL_EVENT_WINDOW_HDR_STATE_CHANGED) {

}

that’s not so much ok, because it’s not clear if everything there (the ) is thread-safe or not. but probably it is !

I you have your own event watch for RESIZED, you have to make sure it is thread safe with your code.

For instance: on Android the RESIZED event can be sent from SDLActivity thread, in onNativeResize() method.

so in that case, the EventWatch is called within SDLActivity thread. and it’s critical that you make it thread safe.

For SDL_APP_WILLENTERBACKGROUND, I think the thread safety will be always ok, because there are some other new internal “lifecycle” event that are sent from SDLActivity to C thread. so SDL push events (and so the EventWatch handler) are done in the main C thread.

(and in fact, you have to use EventWatch event, because I think the BG Event are not pushed anymore).

( but, maybe, here it misses some handshaking / acknowledge in the distribution if those likecylce events :confused:. the activity may continue whereas the event are not processed yet. )

hope it helps,

My Event Watch callback tests only for SDL_APP_WILLENTERBACKGROUND and SDL_APP_DIDENTERBACKGROUND and is inherently thread safe because all it does is set a global flag:

static int myEventFilter(void* userdata, SDL_Event* pev)
{
	switch (pev->type)
	    {
		case SDL_APP_WILLENTERBACKGROUND:
		case SDL_APP_DIDENTERBACKGROUND:
		bBackground = 1 ;
		break ;
	    }
	return 0 ;
}

All other events are handled in SDL_PollEvent() or in fact, in my particular case, I call SDL_PumpEvents() and SDL_PeepEvents().

So I can see no reason why any of the crashes or other misbehavior in Android could result from what I’m doing.

I’m thinking that a SIGSEGV in GLES_RunCommandQueue() must be a clue of some sort. What might cause that?

Indeed, this seems ok has an EventWatch handler.

The only clue of GLES_RunCommandQueue and glLogicalOp, is that’s probably not the issue. It’s either an issue with the android device or a side effect somewhere else in SDL. eventuelly something like rendering in background (not because of you code), but SDL not doing for the background event to be consumed.

Do you think this SIGSEGV is a regression or has it always been there ?

Do you get other strange crash, that could be more precised ?

also checks: Android vitals > Crash and ANR > All non fatals

I found there, something interesting.

Otherwise, for what it worth, I got some issue using SDL_GetTouchFingers(), and that was fixed in SDL3:

I don’t think this has been back-ported to SDL2.

I suspect it has always been there, but it’s happening at quite a low rate, about once or twice a day according to the Google Play statistics.

There are a small number of other crashes, but the SIGSEGV in GLES_RunCommandQueue() is happening far more often than anything else.

Hopefully it soon will be, because unless and until SDL3 supports the OpenGL ES1.1 backend I am not able to use it (or sdl2-compat).

1/ check on the Android console the “non fatals”:

Vitals > Crash and ANR > All non fatals

It can contain other crash, like something running ASAN, and giving an accurate report of the issue. (whereas glLogicalOp seems a misleading information)

2/ if you use SDL_GetTouchFingers(), this may be the issue.

3/ check other issues, there are sometime comments to see if an memory issue is going on somewhere. (and that could be also responsible for the glLogicalOp)

There are none.

I don’t.

I think you may have misunderstood. I use glLogicOp(), but I don’t have any problems with it, it works perfectly.

The main problems I have are:

  1. SIGSEGV crashes.
  2. Cropping of the output, as if SDL2 is reporting the screen size incorrectly.
  3. Failure to restore correctly after putting my app in the background.

These are all quite rare, but they should not be happening, especially as my app is completely stable on other platforms.

ok, good.

ok, so you don’t have this issue.

(Just to make sure this SDL3 name. but in SDL2 this is SDL_GetNumTouchFingers() and SDL_GetTouchFinger() ). As I write, I realise this may not exist in SDL2, since it is inherent to new SDL3 SDL_GetTouchFingers() (check the ‘s’ )

yes, I got it. but you have a problem, it crashes. And probably not because of a bad usage of glLogicOp(), as you said you app is most of the time ok.

for me this kind is issue is:

  • bad android device somewhere. and nothing you can do.
  • some SDL memory issue.

Since there are no clue of SDL memory issue .. stick with “nothing you can do”

  • is this cropping like mismatch orientation ? try to hard code the orientation in AndroidManfest.xml
  • could be some different size fullscreen vs immersive.
    Check the java SDLActivity,java to and hardcode one mode.
    try to remove setOnSystemUiVisibilityChangeListener()

maybe there are some log in console ?

usually this is reproduceable by adding timing at strategic places.

add log to make sure rendering is not done after going to background. and that restore of SDL egl context in not does while in background. (src/video/android/SDL_androidevents.c)

No, I’ve never seen a crash in or near my calls to glLogicOp(); I have no reason to think it is related in any way to the problems I am experiencing. SDL2 supports direct calls to OpenGL functions (I make sure to call SDL_RenderFlush() as needed) so I am not doing anything ‘risky’.

No. I explained earlier in the thread that if I rotate from portrait to landscape the framing is correct, but if I return to portrait (when the fault is present) it is incorrectly cropped again.

Like all the other issues, this is an intermittent and rare fault. The sort of cause you suggest would surely result in consistent misbehavior, which I don’t see.

I know that Android poses particular difficulties for SDL2, which may contribute to these problems, but if it’s not possible to make SDL2 stable in Android it would be better to admit that and not claim to support that platform.

ok, I hoped that helps. good luck

I’m suspicious that the error is caused because Android drops data when the context changes (Like rotating the screen). I’m way out of date as far as developing on Android, but can you possibly catch Android functions like onSaveInstanceState() from within your app?

I suspect SIGSEGV is happening because the renderer’s pointer has been invalidated by the rotation, (on some systems this garbage pointer might still direct to the old renderer’s data, and others override the pointer to NULL or some other garbage address outside of the program’s range. Accessing the old data gives you bad cropping while the others attempting to access an invalidated pointer are then “accessing invalid memory” as far as the system is concerned giving SIGSEGV.
That potentially explains why you see bad cropping on most systems and some others get SIGSEGV.

Back to it:
I think destroying the old and creating a new renderer might fix the issue. (But if it’s already invalidated, then trying to destroy/access the old one might cause the very same problem, so you may need to destroy the old one before the rotation is processed by the system and create a new one after the rotation)

I think that’s a potential difference between the two window resize events. Does one of the below consistently fire first when rotating?

    SDL_WINDOWEVENT_RESIZED,        /**< Window has been resized to data1xdata2 */
    SDL_WINDOWEVENT_SIZE_CHANGED,   /**< The window size has changed, either as
                                         a result of an API call or through the
                                         system or user changing the window size. */

I’m worried that those two events might not be guaranteed in the same event loop, which allows a frame to be drawn on that invalid/deleted pointer. You might have to go one step further by having a pointer to your rendering function that you can flip to a “Do nothing” function while the pointer is invalid to guarantee it is not accessed in this time. Then flip back to the actual render function once the new renderer is created.

  • I apologize for the amount of guess-work that I’ve put into this post.

Do any other window events fire when the program gets put to background and brought back, like SDL_WINDOWEVENT_HIDDEN, SDL_WINDOWEVENT_SHOWN, SDL_WINDOWEVENT_EXPOSED,
SDL_WINDOWEVENT_LEAVE, SDL_WINDOWEVENT_ENTER,
…focus_gained, …focus_lost etc?

I don’t know if one of these might be a more reliable API to destroy the old renderer, avoid render calls, and create the new one on restoration.

Edit: I see we’re talking GLES rather than the SDL_Renderer, but I think that it’s a fair analogue. Try doing it with your GLES context in a similar manner: Destroy it when leaving, recreate it when the program regains focus. Avoid accessing GL functions when in the background or rotated until the new context is ready.