Last night I finally got our DV decoder to use SDL. I can play 720x480
video in full-screen in a 2048x1536 modeline rather nicely, albeit
slowly…
My question is: what kind of speed should I be expecting? My first test
program looked like this:
for (i=0;i<300;i++) {
SDL_PumpEvents();
SDL_LockYUVOverlay(overlay);
memcpy(overlay->pixels,yuvdata,length);
SDL_UnlockYUVOverlay(overlay);
SDL_DisplayYUVOverlay(overlay,&screenrect);
}
This was taking about 6-7 seconds (real-time would be 10) on my K6-2
running at 360, with a G400 MAX dual-head, runing XFree 4.0 from Rawhide.
This seems awfully high to me, with very little room left over for actual
processing (DV is about as heavy as software MPEG-2 MP at ML, that is to say:
very). Granted, the machine’s running the stock RH6.2 kernel, i.e. no AGP
kernel code. I’m attempting to run it on 2.4.0-test1-ac3, but for some
reason everything is segfaulting under that kernel…
Total bandwidth should be about 20MB/sec at 30 frames/sec, which shouldn’t
even make the bus break a sweat. Am I doing something else wrong? It’s
hard to tell from the smpeg code what the proper sequence is. smpeg seems
to do a SDL_Flip() in plaympeg.c, but I’m not sure how that works with
overlays.
Sometime early this week I hope to get my hands on some HDTV material,
which I’ll be trying to display on this machine (but most definitely not
real-time), so I’ll have a killer test-case for the YUV overlays
(1920x1080). I also just noticed that I have to get the CVS XFree, since
there was a commit two days ago that enables texture-based overlays on
the G400. This is notable since I just ran across a table in CVS that says
regular overlay is limited to 1024x1024, whereas texture overlays can go
to 2046x2046.
I just spent a few minutes perusing the MGA driver in XFree CVS, it seems
that the X server is doing memcpy’s of all the data. AGP isn’t utilized
for the transfer at all. This is a significant time drain: top(1) shows
almost equal usage of both the test program and X. I would bet that a
profile of X during that time pots the vast majority of the cyles in
MGACopyData(), which is a simple striding memcpy. Things get worse if
you’re not using YUY2, since YV12 does a significant amount of work in
MGACopyMungedData():
U32 *dst = (U32 *)dst1;
for(j = 0; j < h; j++) {
for(i = 0; i < w; i++) {
dst[i] = src1[i << 1] | (src1[(i << 1) + 1] << 16) |
(src3[i] << 8) | (src2[i] << 24);
}
…
}
I’ll try to decipher what that loop does, and see if it’s even necessary
or just an unnecessary format conversion (looks vaguely like a
planar->packed conversion, maybe?). According to FourCC
(webartz.com/fourcc), the G400 supports YV12 natively in DirectX, so such
a conversion shouldn’t be necessary. Regardless, I’ve signed up for
Matrox’s developer program, in hopes of being able to get some kind of
hardware assisted copy set up from the XShm segment.
Anyway, you’ll probably be hearing more from me in the near future, as I
push as hard as I can on this stuff.
Erik Walthinsen <@Erik_Walthinsen> - Staff Programmer @ OGI
Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/
Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/__
/ \ SEUL: Simple End-User Linux - http://www.seul.org/
| | M E G A Helping Linux become THE choice
\ / for the home or office user