More MIT-SHM stuff (reply to earlier msg)

Drew_Hess · December 21, 2000, 11:23pm

Why don’t you read-ahead and buffer the undecoded frames in memory
instead?
Then just decode next frame immediately after calling UpdateRect() on
the previous one, and sleep until it should be displayed. You may still
do the reading from another thread if you like.

Because for some movies I’m decode-limited, and I want to be decoding
constantly. For these movies, I don’t need the extra surfaces because I
can’t quite keep up with the desired frame rate anyway, and I’m drawing
as soon as they’re ready.

But for others, I’m not decode-limited, and in these cases I want to
buffer up decoded frames to be opportunistic about disk/network bandwidth
when I’ve got it.

So the mechanism I’m currently using is flexible and works well for both
of these cases; the only drawback is the extra blit that’s required for
the non-shmapped surfaces. If I can fix the SDL “limitation” of only one
shmapped surface, everything is groovy and I get the best of both worlds.

-dwh-

Drew_Hess · December 22, 2000, 2:00am

corrections to my own post after further reflection:

Why don’t you read-ahead and buffer the undecoded frames in memory
instead?
Then just decode next frame immediately after calling UpdateRect() on
the previous one, and sleep until it should be displayed. You may still
do the reading from another thread if you like.

Because for some movies I’m decode-limited, and I want to be decoding
constantly. For these movies, I don’t need the extra surfaces because I
can’t quite keep up with the desired frame rate anyway, and I’m drawing
as soon as they’re ready.

Actually, when I’m decode-limited, I need the additional surfaces because
I don’t want the decoder to wait for the surface to free up while
SDL_UpdateRect() is doing its thing. I’ve got one thread doing decode and
the other handling the frame updates so that, on an MP machine (which is
the intended target platform), I’ll get concurrency.

But for others, I’m not decode-limited, and in these cases I want to
buffer up decoded frames to be opportunistic about disk/network bandwidth
when I’ve got it.

In this case your proposal would probably work fine; of course, so does
the one I describe.

So the mechanism I’m currently using is flexible and works well for both
of these cases; the only drawback is the extra blit that’s required for
the non-shmapped surfaces. If I can fix the SDL “limitation” of only one
shmapped surface, everything is groovy and I get the best of both worlds.

I got my cases backwards, but the above conclusion still applies.

thanks for the feedback
-dwh-On Thu, 21 Dec 2000, Drew Hess wrote:

Mattias_Engdegard · December 22, 2000, 10:48am

Actually, when I’m decode-limited, I need the additional surfaces because
I don’t want the decoder to wait for the surface to free up while
SDL_UpdateRect() is doing its thing. I’ve got one thread doing decode and
the other handling the frame updates so that, on an MP machine (which is
the intended target platform), I’ll get concurrency.

That’s the only case when multiple back buffers would help — when you are
decode-limited on an SMP. I was assuming that if you have an SMP box, you
probably have a cpu fast enough to be able to decode in real time anyway.

And even if that is not true, you would gain a lot more from decoding
several frames in parallel than merely decoding in one thread and
shoving bits across the bus in another. I presume decoding is the real
cpu hog, and where an SMP box would be really useful. Multiple back
buffers would help here too but as I said it’s not scheduled for 1.2.
You can hack SDL to do it, or use a hardware surface.

You would find it useful to have a way of finding the number of cpus
on the machine. We could easily add a call for that to the SDL threads
API (we already have some code to do it in SDL_x11image.c)

Mattias_Engdegard · December 22, 2000, 2:14pm

I wrote:

And even if that is not true, you would gain a lot more from decoding
several frames in parallel than merely decoding in one thread and
shoving bits across the bus in another

This obviously does not improve matters. Sorry for the confusion

Drew_Hess · December 22, 2000, 6:50pm

That’s the only case when multiple back buffers would help — when you are
decode-limited on an SMP. I was assuming that if you have an SMP box, you
probably have a cpu fast enough to be able to decode in real time anyway.

And even if that is not true, you would gain a lot more from decoding
several frames in parallel than merely decoding in one thread and
shoving bits across the bus in another. I presume decoding is the real
cpu hog, and where an SMP box would be really useful. Multiple back
buffers would help here too but as I said it’s not scheduled for 1.2.
You can hack SDL to do it, or use a hardware surface.

Right, I do plan on creating additional threads for decoding frames in
parallel (the display thread will be sleeping most of the time, so I
can get nearly 100% of both CPUs doing decode on a dual-proc machine), and
even in that case I need someplace to write them, hence my desire to have
additional surfaces.

HW surfaces aren’t an option for this particular app because it runs in
X11 and needs to be windowed.

I’ll add the support for multiple “video” surfaces and send you some code
once I’ve got it working in case you’re interested in putting it in 1.2.

You would find it useful to have a way of finding the number of cpus
on the machine. We could easily add a call for that to the SDL threads
API (we already have some code to do it in SDL_x11image.c)

Yeah I would. I saw num_CPUs() in there and could expose that to the API.

thanks again for the feedback
-dwh-On Fri, 22 Dec 2000, Mattias Engdegard wrote:

Drew_Hess · December 22, 2000, 6:58pm

And even if that is not true, you would gain a lot more from decoding
several frames in parallel than merely decoding in one thread and
shoving bits across the bus in another. I presume decoding is the real
cpu hog, and where an SMP box would be really useful. Multiple back
buffers would help here too but as I said it’s not scheduled for 1.2.
You can hack SDL to do it, or use a hardware surface.

Right, I do plan on creating additional threads for decoding frames in
parallel (the display thread will be sleeping most of the time, so I
can get nearly 100% of both CPUs doing decode on a dual-proc machine), and
even in that case I need someplace to write them, hence my desire to have
additional surfaces.

Oh, and I should mention (to answer your comment in a later mail about
this not helping) that in the codecs I’m using, the frames are stored as
JPEGs and there’s no interframe compression (i.e. each frame is
independent of all others), so having additional CPUs+threads for parallel
decoding does help, in my case. I still want the shmapped surfaces,
though, because every 24th of a second one of the CPUs will have to do an
SDL_UpdateRect, and I want it to be as fast as possible so I can get back
to decoding.

-dwh-On Fri, 22 Dec 2000, Drew Hess wrote:

Mattias_Engdegard · December 22, 2000, 7:49pm

Oh, and I should mention (to answer your comment in a later mail about
this not helping) that in the codecs I’m using, the frames are stored as
JPEGs and there’s no interframe compression (i.e. each frame is
independent of all others), so having additional CPUs+threads for parallel
decoding does help, in my case.

looking again it seems that it’s indeed possible, at the price of some
considerable complexity and the risk of ping-ponging processes between
the CPUs (with all this implies of TLB/cache locality problems).

From another posting:
HW surfaces aren’t an option for this particular app because it runs in
X11 and needs to be windowed.

DGA2 has some support for direct window access but nothing SDL can use,
and it’s doubtful that this would change in the near future

I’ll add the support for multiple “video” surfaces and send you some code
once I’ve got it working in case you’re interested in putting it in 1.2.

please do, but I’ll have to think very hard on the design if it’s going in
since I’m reluctant to add something that big just for solving a special case
You may of course alter SDL as you wish for your special purposes (as long
as you keep the changes public)

Martin_Donlon · December 23, 2000, 12:19am

It does?On Fri, Dec 22, 2000 at 08:49:30PM +0100, Mattias Engdeg?rd wrote:

From another posting:
HW surfaces aren’t an option for this particular app because it runs in
X11 and needs to be windowed.

DGA2 has some support for direct window access but nothing SDL can use,
and it’s doubtful that this would change in the near future
–
Martin
–
Bother, said Pooh, as he saw Ms. Bobbit drive up.