Bug in windows PumpEvents causes late event processing

I’m working with Natural Selection 2, which uses SDL for input handling, and lately we have noticed issues with the mouse input - it seemed to be choppy and hitch a lot.

After spending a few days debugging, I believe I have found the culprit.

The main loop in SDL_events.c::SDL_WaitEventTimeout

for (;;) {
    switch (SDL_PeepEvents(event, 1, SDL_GETEVENT, SDL_FIRSTEVENT, SDL_LASTEVENT)) {

… makes one big assumption - that SDL_PumpEvents will actually put an event on the internal event queue.

In turn, SDL_windowsevents.c::WIN_PumpEvents makes two assumptions

  • that PeekMessage will actually dispatch a message to the WIN_WindowProc, so it can put an event on the event queue
  • that the events will have its msg.time in order

Assumptions that are sane and rational, and thus of course wrong in Windows.

It turns out that Microsoft has added high-priority events of some kind that PeekMessage will handle internally - thus WIN_WindowProc will not be called for them.

In addition, these messages are inserted out of order - so the timestamp on them will often be the very recent, and with the 10-16ms resolution of the Windows system clock, this will often be the same as GetTickCount.

Thus, the loop in PumpEvents will hit the exit condition

        if (SDL_TICKS_PASSED(msg.time, start_ticks)) { 

becuase the msg.time will be >= start_ticks.

This breaks the assumption in SDL_WaitEventTimeout, and will cause it to return early, saying that it has processed all events even though there might be dozens of input events waiting to be processed.

The easy patch is to just remove the exit condition from PumpEvents, forcing it to process all events.

The slightly more complex patch is to verify that PeekMessage actually generated an event in the event queue before allowing the time check to break the loop.

The workaround for people using SDL is to call SDL_WaitEventTimeout multiple times … if there are no events to be processed, the cost is low, so calling it 3-10 times in a loop should be fine.



PS. Adding a bit of log output to show the weird time and events.

The leftmost field is the gametime clock,

** PumpEvents starting at GetTickCount 706824703
[210.377] MainThread : — PumpEvents 706824703
[210.377] MainThread : Processed 26 events, mouse motion -190,469
*** new frame starts here
[210.415] MainThread : — PumpEvents 706824750
** Get an unknown message here that are not routed to WIN_WindowProc
[210.415] MainThread : Peeked message: UNKNOWN (706824750)
** triggers the loop break condition
[210.415] MainThread : Break because msg.time 706824750, start_ticks 706824750
[210.415] MainThread : Processed 0 events, mouse motion 0,0
** This causes us to believe that there we no mouse motion during that frame
[210.415] MainThread : Mouse motion stop
[210.447] MainThread : — PumpEvents 706824781
[210.447] MainThread : Received windows message: 2c103c WM_GETICON – 2, 0x0, 0xCCCCCCCC
** Note time time on this event - 706824718; earlier than the previous event… this is a mouse event that should have been processed the previous frame
[210.447] MainThread : Peeked windows message: WM_INPUT (706824718)
[210.447] MainThread : Received windows message: 2c103c WM_INPUT – 0, 0xE1210F7D, 0x0
[210.448] MainThread : Relative 708,560 -> 3,21 (3,21)
[210.448] MainThread : Peeked windows message: WM_INPUT (706824718)
[210.448] MainThread : Received windows message: 2c103c WM_INPUT – 0, 0x306D0CB7, 0x0