YUVOverlay performance problems

I’m trying to make program that can deinterlace full screen PAL (768x576
25fps = 40ms/frame) video into progressive frames with scaling in real
time and I’ve achieved this target. My deinterlace routine now tooks about
20ms with my hardware. I’m still hoping to add real time noise filtering.
Unfortunately, the problem is that showing the resulting YUV-image tooks
too much time and I’m already using all my CPU power because of it. I’m
currently using YV12 format because it seems to be fastest and I do
deinterlacing directly to overlay.

Have I done something wrong or should displaying YUV-overlay really be
this slow with hardware acceleration? Is there some another way of showing
overlay but SDL_DisplayYUVOverlay()? With the source below I have got
following timings:

YV12: 17 ms Hardware: yes
IYUV: 32 ms Hardware: no
YUY2: 33 ms Hardware: yes
UYVY: 33 ms Hardware: yes
YVYU: 49 ms Hardware: no

Where “Hardware” is according to information gathered from
SDLOverlay.hw_overlay. I don’t know about you, but if YV12 is supported by
hardware and I get direct(?) access to overlay memory, displaying the
image shouldn’t take 17ms IMHO. If the call would just block for this time
it wouldn’t matter but it eats all CPU…

I’ve tested with the following environment:

Linux 2.4.4
SDL 1.2.0
XFree 4.0.99.3 (from dri.sourceforge.net, Xvideo is supported)

MSI K7T Pro
AMD Duron 650
Matrox G400 Single 16MB
256MB

Also, is there a way to disable screen saver through SDL and a way to wait
for Vertical Sync?

– Mikko

Code used for testing follows:
( Compiling: gcc -o test test.c sdl-config --cflags --libs )
------------- test.c ----------------
int main(void)
{
SDL_Surface* screen;
SDL_Overlay* overlay; /* yuv overlay */
SDL_Rect drect;
Uint32 stime,time;
int i;

SDL_Init(SDL_INIT_VIDEO|SDL_INIT_TIMER); /* SDL_INIT_AUDIO */

screen = SDL_SetVideoMode(800, 600, 16, SDL_HWSURFACE);
overlay = SDL_CreateYUVOverlay(768, 576,SDL_YV12_OVERLAY, screen);

stime = SDL_GetTicks();
for (i = 0; i < 100; i++)
{
    time = SDL_GetTicks();
    SDL_LockYUVOverlay(overlay);

    /* real image rendering onto overlay pixels would be here */

    SDL_UnlockYUVOverlay(overlay);
    printf("SDL_Lock+UnlockYUVOverlay() took %d ms. ",
           SDL_GetTicks() - time);

    time = SDL_GetTicks();

    if (1)
    {
        drect.x=0;
        drect.y=0;
        drect.w=800;
        drect.h=600;
        SDL_DisplayYUVOverlay(overlay, &drect);
    }

    printf("SDL_DisplayYUVOverlay() took %d ms.\n",
           SDL_GetTicks() - time);
}

stime = SDL_GetTicks() - stime;
printf("Average %f ms per frame.\n",(float)stime/100);
printf("Hardware: %s\n", overlay->hw_overlay ? "yes" : "no");

SDL_FreeYUVOverlay(overlay);
SDL_Quit();

return 0;

}
------------- /test.c ---------------

Hi Mikko,

Where “Hardware” is according to information gathered from
SDLOverlay.hw_overlay. I don’t know about you, but if YV12 is supported by
hardware and I get direct(?) access to overlay memory, displaying the
image shouldn’t take 17ms IMHO. If the call would just block for this time
it wouldn’t matter but it eats all CPU…

The YV12 overlay uses a YUY2 hardware overlay on Matrox G400 cards.
I don’t know why the XFree86 developers have chosen not to include
direct support for YV12, as it’s much faster. Now the YV12 overlay has
to be copied byte-by-byte, multiplexing luminance and chrominance
samples to form valid YUY2 overlay data for every frame.

I am working on a patch for Matrox G400 (I have noticed the same, poor
performance). Although it’s still a bit buggy (distortions when down-
scaling 720x576 overlays), the performance has increased A LOT.

“Average 4.050000 ms per frame.”

(test.c)
RedHat 7.1
Linux 2.4.5
XFree86-4.0.99.901

Pentium II 350MHz
Matrox G400 (SH) 16MB
256MB

So your problem is probably caused by the current XFree86 mga driver,
not SDL. If you want to try the patch, just let me know, and I’ll email
it to you.

bye,

ewald