Why is SDL so much faster than OpenGl

Peter_Stuetz · June 27, 2006, 7:45pm

Hi folks,

I would like have Your advice/comments/help on the following topic.
I developing a “simple” video-viewing application on a linux box.

The requiremets of the application SW are as follows

display a PAL- live - video (provided by a video camera) hooked up to the
framegrabber (via composite signal)
on the desktop in 25 Hz
overlay an animated 3D graphics (HUD-Style, tunnel) on top of the video-image

The problem is:

lousy performance##################################################################

The hardware consists of

a (cutomer-furnished) PC
Gfx Card: ???
CPU: ???
a PCMCIA framme-grabber from HASOTEC (FG33)

The OS/driver SW is as follows

SuSe 10.0

unmame -r:
2.6.13-15-default
Gfx-Capability:

glxinfo:
direct rendering: Yes
…
OpenGL vendor string: Tungsten Graphics, Inc
OpenGL renderer string: Mesa DRI Intel(R) 852GM/855GM 20050225 x86/MMX/SSE2
OpenGL version string: 1.3 Mesa 6.2.1
…
a video4linux kernel module for the framegabber provided by HASOTEC

lsmod | grep fg:
fg3xv4l2 58624 0
videodev 9088 1 fg3xv4l2

So I produced an application with the following features
- A Qt Framework with only a Glwidget inside
- continously grabbing an image using a C+±class wrapping v4l-typical access
to the video device
- display the image in a QGl-widget
- draw3D overlay

##################################################################
A very condensed overview on the SW is as follows

// initialize() opens the video Input and sets up a texture to hold the grabbed
image. The size is 1024*1024 to complient with the s^n
//requirement

#define TEXTURE_WIDTH 1024
#define TEXTURE_HEIGHT 1024

void SystDisp_Video::initialize()
{

video = new videoInput(“/dev/video0”);

void * texdata = calloc(1,TEXTURE_WIDTH * TEXTURE_HEIGHT * 3);
glGenTextures(1, &texture[0]);
glBindTexture(GL_TEXTURE_2D, texture[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, TEXTURE_WIDTH, TEXTURE_HEIGHT,
            0, GL_BGR, GL_UNSIGNED_BYTE, texdata);

glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_MAG_FILTER,GL_NEAREST);
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_MIN_FILTER,GL_NEAREST);
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_WRAP_S,GL_CLAMP);

glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_WRAP_T,GL_CLAMP);
glMatrixMode(GL_TEXTURE);
glScalef((float)video->width() / TEXTURE_WIDTH,
(float)video->height() / TEXTURE_HEIGHT, 1);
glMatrixMode(GL_MODELVIEW);
}

// paint() is basically called by a Tt-timmer set up with 20 ms
// grab_video() blacks until a new image is received
// If texture is enabled the video_image is drawn as texture. In that case the
texture is redefined using “glTexSubImage2D()”
// If texture is not defined the image is painted using “glDrawPixels()”

void SystDisp_Video::paint()
{
video->grab_video( );

glLineWidth( 1.0 );
glColor4f(1.0, 1.0, 1.0, 1.0);
glPushMatrix();
{
	if(texture)
	{

//// TEXTURE

		glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, video->width(), video->height(),
           GL_BGR, GL_UNSIGNED_BYTE, video->currentFrame());


		glBegin(GL_QUADS);
		glTexCoord2f(0.0f, 1.0f); glVertex2f(10.f, 10.f);	// Bottom Left Of The

Texture and Quad
glTexCoord2f(1.0f, 1.0f); glVertex2f( win_w-10, 10.f); // Bottom Right Of The
Texture and Quad
glTexCoord2f(1.0f, 0.0f); glVertex2f( win_w-10, win_h-10); // Top Right Of
The Texture and Quad
glTexCoord2f(0.0f, 0.0f); glVertex2f(10.f, win_h-10); // Top Left Of The
Texture and Quad
glEnd();

//////// PIXEL
}
else
{
glPixelZoom(1.0, -1.0);
glRasterPos2i(0, win_h-1);
glDrawPixels(video->width(), video->height(), GL_BGR,
GL_UNSIGNED_BYTE,video->currentFrame());
}
}
glPopMatrix();
}
##################################################################

The problems are as follows:

The application runs in both variants in only roughly 10 Hz. Commenting out
“glTexSubImage2D()” results in nice and firm 25 Hz (according
to the update rate of the frame grabber).

The framegrabber driver comes with a demo application using v4l and SDL, which
runs with the desired 25 Hz

init()
{
	...
	g_screenSurface = SDL_SetVideoMode( 768, 576, 16, SDL_HWSURFACE |

SDL_DOUBLEBUF );
g_videoSurface = SDL_CreateRGBSurfaceFrom(currentframe(), 768, 576, 16,
768*2,0x000000,0x000000,0x000000,0 );
…
}

render()
{
...
 SDL_BlitSurface( g_videoSurface, &r, g_screenSurface, &r );
...
}

So my questions are:

Is there anything I can do to speed up my QT/OpenGL Applikation. How about the
texture issue? All the time is eaten up in “glTexSubImage2D”. Is there an
issue wizh the parameters ?
I do not necessarly need fancy texture mapping. Is there a more direct way to
bring it to the frame buffer, but still have the possibility to combine it with
a 3D-Overlay
Is there a more perfomant Qt-only way (no OpenGl) to have the video and the
3D-overlay?
What is SDL so much better performing than Qt/OpenGL. What is the aproprate
OpenGL Way to have the same “BlitSurface”-functionality?
I would have no objections to do the application in SDL, but I have no clue,
how to do the
3D-Overlay in SDL.

Any help is welcome

Peter

Rene_Dudfield · June 28, 2006, 7:25am

It looks like you are using hardware acceleration with SDL. If X
supports it, this is a really quick way to blit pixels.

With SDL you are creating a 16 bit window.

What are you using with opengl?

Have you tried using GL_RGB instead of GL_BGR(it might be doing the
conversion in a slow software routine)? Have you tried using faster
filtering methods?

Some paths in opengl are slow on different cards/drivers. So you
might be hitting a slow path somewhere. Try tweaking a few things and
see if it improves things.

Is vsync disabled for SDL, and not for opengl?

Have you tried using overlays? See $QTDIR/examples/opengl/overlay for
the source code.

Good luck!On 6/28/06, Peter Stuetz wrote:

Hi folks,

I would like have Your advice/comments/help on the following topic.
I developing a “simple” video-viewing application on a linux box.

The requiremets of the application SW are as follows

display a PAL- live - video (provided by a video camera) hooked up to the
framegrabber (via composite signal)
on the desktop in 25 Hz

overlay an animated 3D graphics (HUD-Style, tunnel) on top of the video-image

The problem is:

lousy performance

##################################################################

The hardware consists of

a (cutomer-furnished) PC

Gfx Card: ???

CPU: ???

a PCMCIA framme-grabber from HASOTEC (FG33)

The OS/driver SW is as follows

SuSe 10.0
>unmame -r:
2.6.13-15-default

Gfx-Capability:
>glxinfo:
direct rendering: Yes
…
OpenGL vendor string: Tungsten Graphics, Inc
OpenGL renderer string: Mesa DRI Intel(R) 852GM/855GM 20050225 x86/MMX/SSE2
OpenGL version string: 1.3 Mesa 6.2.1
…

a video4linux kernel module for the framegabber provided by HASOTEC
>lsmod | grep fg:
fg3xv4l2 58624 0
videodev 9088 1 fg3xv4l2

So I produced an application with the following features
- A Qt Framework with only a Glwidget inside
- continously grabbing an image using a C+±class wrapping v4l-typical access
to the video device
- display the image in a QGl-widget
- draw3D overlay

##################################################################
A very condensed overview on the SW is as follows

// initialize() opens the video Input and sets up a texture to hold the grabbed
image. The size is 1024*1024 to complient with the s^n
//requirement

#define TEXTURE_WIDTH 1024
#define TEXTURE_HEIGHT 1024

void SystDisp_Video::initialize()
{

video = new videoInput(“/dev/video0”);
void * texdata = calloc(1,TEXTURE_WIDTH * TEXTURE_HEIGHT * 3);
glGenTextures(1, &texture[0]);
glBindTexture(GL_TEXTURE_2D, texture[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, TEXTURE_WIDTH, TEXTURE_HEIGHT,
            0, GL_BGR, GL_UNSIGNED_BYTE, texdata);

glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_MAG_FILTER,GL_NEAREST);
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_MIN_FILTER,GL_NEAREST);
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_WRAP_S,GL_CLAMP);
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_WRAP_T,GL_CLAMP);
glMatrixMode(GL_TEXTURE);
glScalef((float)video->width() / TEXTURE_WIDTH,
(float)video->height() / TEXTURE_HEIGHT, 1);
glMatrixMode(GL_MODELVIEW);
}

// paint() is basically called by a Tt-timmer set up with 20 ms
// grab_video() blacks until a new image is received
// If texture is enabled the video_image is drawn as texture. In that case the
texture is redefined using “glTexSubImage2D()”
// If texture is not defined the image is painted using “glDrawPixels()”

void SystDisp_Video::paint()
{
video->grab_video( );
glLineWidth( 1.0 );
    glColor4f(1.0, 1.0, 1.0, 1.0);
    glPushMatrix();
    {
            if(texture)
            {
//// TEXTURE
                    glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, video->width(), video->height(),
           GL_BGR, GL_UNSIGNED_BYTE, video->currentFrame());


                    glBegin(GL_QUADS);
                    glTexCoord2f(0.0f, 1.0f); glVertex2f(10.f, 10.f);       // Bottom Left Of The
Texture and Quad
glTexCoord2f(1.0f, 1.0f); glVertex2f( win_w-10, 10.f); // Bottom Right Of The
Texture and Quad
glTexCoord2f(1.0f, 0.0f); glVertex2f( win_w-10, win_h-10); // Top Right Of
The Texture and Quad
glTexCoord2f(0.0f, 0.0f); glVertex2f(10.f, win_h-10); // Top Left Of The
Texture and Quad
glEnd();

//////// PIXEL
}
else
{
glPixelZoom(1.0, -1.0);
glRasterPos2i(0, win_h-1);
glDrawPixels(video->width(), video->height(), GL_BGR,
GL_UNSIGNED_BYTE,video->currentFrame());
}
}
glPopMatrix();
}
##################################################################

The problems are as follows:

The application runs in both variants in only roughly 10 Hz. Commenting out
“glTexSubImage2D()” results in nice and firm 25 Hz (according
to the update rate of the frame grabber).

The framegrabber driver comes with a demo application using v4l and SDL, which
runs with the desired 25 Hz
    init()
    {
            ...
            g_screenSurface = SDL_SetVideoMode( 768, 576, 16, SDL_HWSURFACE |
SDL_DOUBLEBUF );
g_videoSurface = SDL_CreateRGBSurfaceFrom(currentframe(), 768, 576, 16,
768*2,0x000000,0x000000,0x000000,0 );
…
}
    render()
    {
    ...
 SDL_BlitSurface( g_videoSurface, &r, g_screenSurface, &r );
    ...
    }
So my questions are:

Is there anything I can do to speed up my QT/OpenGL Applikation. How about the
texture issue? All the time is eaten up in “glTexSubImage2D”. Is there an
issue wizh the parameters ?

I do not necessarly need fancy texture mapping. Is there a more direct way to
bring it to the frame buffer, but still have the possibility to combine it with
a 3D-Overlay

Is there a more perfomant Qt-only way (no OpenGl) to have the video and the
3D-overlay?

What is SDL so much better performing than Qt/OpenGL. What is the aproprate
OpenGL Way to have the same “BlitSurface”-functionality?

I would have no objections to do the application in SDL, but I have no clue,
how to do the
3D-Overlay in SDL.

Any help is welcome

Peter

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

icculus · June 28, 2006, 8:20am

#define TEXTURE_WIDTH 1024
#define TEXTURE_HEIGHT 1024
[…]
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, TEXTURE_WIDTH, TEXTURE_HEIGHT,
0, GL_BGR, GL_UNSIGNED_BYTE, texdata);

Is it possible to read in a smaller size and scale it up with GL_LINEAR
filtering? It might be faster and look about the same, since your
performance hit is all about the time it takes to move texture data to
the video chip and not about what it does with it afterwards…the Intel
852 has hardware trilinear filtering anyhow. Does the framegrabber
actually give you anywhere near 1024x1024?

// paint() is basically called by a Tt-timmer set up with 20 ms

Ugh, don’t do that. You’re probably not hitting this every 20ms. Better
to loop and draw when you can, or when you want. Definitely don’t do
this if grab_video() is going to block anyhow. Just do an infinite loop
that draws and dispatches Qt events. Worry about CPU usage last.

// grab_video() blacks until a new image is received

Is the 3D thing over the picture doing any animation? If so, don’t block
here (or render on a 20ms timer!), which will help your framerate…and
only do a new texture upload when you actually got a read from the
framegrabber. If the 3D stuff isn’t animated, consider whether you
really need OpenGL here (might be faster to use the 2D facilities, and
blit a static, prerendered image over the video frames).

The specsheet for that video chip claims it can use YUV images as
texture data, but I don’t know if that’s exposed anywhere in OpenGL.
Chances are the framegrabber can do YUV data faster.

                   glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, video->width(), video->height(),
          GL_BGR, GL_UNSIGNED_BYTE, video->currentFrame());

Try GL_RGB, and if that doesn’t help, see if you can get the
framegrabber to feed you data with an always-full alpha channel…the
Intel specsheet says it takes 16-bit and 32-bit data (and suggests that
32-bit mode is RGBA, not BGRA), so there’s probably a software
conversion from 24 to 32 bit going on in the driver here (and worse, it
might be doing two conversions: BGR->RGB and then RGB->RGBA). I’m not
absolutely certain which channel order the hardware favors internally,
though.

The specsheet says it also supports 16-bit data at the hardware level,
so it might actually be cheaper to convert the data by hand before
uploading it, if the performance hit is in pushing it over the bus. If
the camera is crappy, you might not have any noticable color loss by
dropping to 16-bit.

By any chance does your GL implementation have the
GL_ARB_texture_non_power_of_two extension? You don’t have to feed it
empty padding data in that case.

                   glEnd();

I’m not sure it matters for one polygon, but you might want to try not
using immediate mode for a speed boost, too.

Other options: only upload every other frame. Only upload half of the
frame every other time, so you get most of the likely relevant data
without much loss.

Mostly: use a smaller texture…1024x1024 is pretty damned big for what
you’re doing here. And don’t block, either in Qt or the framegrabber.

As to why SDL is faster: it’s probably using a YUV overlay instead of
OpenGL. Much faster (since the facility basically exists for this very
purpose), but it sort of precludes the 3D bits you want, and it’s
probably not on a 20ms timer.

–ryan.

Jorge_Fernandez_Mont · June 29, 2006, 9:53am

Hello all,

First of all, it’s a bit little of topic.

I was wondering if anybody knows a method to decomposing a surface in
the next way. The idea is to map this surface rects in openGL textures
and avoid wasting texture memory from the transparent pixels.

First, I want to discard the transparent pixels from the surface. For
the visible ones, I want to group them in rectangles, the biggest
available. The next figure try to show this:

…
…wwww…
.yyyxxxxzzz.
.yyyxxxxzzz.
.yyyxxxxzzz.
…vvvv…
…

The ‘.’ are transparent pixels, the different letters show the 5
decomposing rectangles. The different rects maps to different size
textures (256x256 for the ‘x’, 128x128 for the ‘y’ and ‘z’, and
128x32 for the ‘w’ and ‘v’ rects, for instance).

The ‘w’ and ‘v’ rects are not allowed to be in ‘x’ rect to avoid the
texture junction problem on rotate texs. This problem arise if there
are vertexs from one rectangle not overlapping the vertexs of the
next rect.

     0--------0
     |        |
0----0        0----0
|    |        |    |
|    |        |    |
|    |        |    |
0----0        0----0
     |        |
     0--------0

The solution is providing more rects, the ‘w’ and ‘v’.

     0--------0
     |        |
0----0--------0----0
|    |        |    |
|    |        |    |
|    |        |    |
0----0--------0----0
     |        |
     0--------0

Then, anybody knows some algorithm to solve this decomposing
in a, more or less, efficient way ? Any comment is welcome.

Best regards,
Jorge

Sebastian_Beschke · June 29, 2006, 2:49pm

Hi,

I’m not really an expert in that field, but I’d be guessing that this
method, in whatever way you implement it, would be to slow (switching
active textures takes time AFAIK) to justify implementing that
optimization. Better waste a couple of pixels of texture memory (judging
from your example, it also doesn’t seem as if your textures are huge)
and save lots of development time, is my opinion.

Sebastian

jorgefm at cirsa.com wrote:>

Hello all,

First of all, it’s a bit little of topic.

I was wondering if anybody knows a method to decomposing a surface in
the next way. The idea is to map this surface rects in openGL textures
and avoid wasting texture memory from the transparent pixels.

First, I want to discard the transparent pixels from the surface. For
the visible ones, I want to group them in rectangles, the biggest
available. The next figure try to show this:

…
…wwww…
.yyyxxxxzzz.
.yyyxxxxzzz.
.yyyxxxxzzz.
…vvvv…
…

The ‘.’ are transparent pixels, the different letters show the 5
decomposing rectangles. The different rects maps to different size
textures (256x256 for the ‘x’, 128x128 for the ‘y’ and ‘z’, and
128x32 for the ‘w’ and ‘v’ rects, for instance).

The ‘w’ and ‘v’ rects are not allowed to be in ‘x’ rect to avoid the
texture junction problem on rotate texs. This problem arise if there
are vertexs from one rectangle not overlapping the vertexs of the
next rect.
     0--------0
     |        |
0----0        0----0
|    |        |    |
|    |        |    |
|    |        |    |
0----0        0----0
     |        |
     0--------0
The solution is providing more rects, the ‘w’ and ‘v’.
     0--------0
     |        |
0----0--------0----0
|    |        |    |
|    |        |    |
|    |        |    |
0----0--------0----0
     |        |
     0--------0
Then, anybody knows some algorithm to solve this decomposing
in a, more or less, efficient way ? Any comment is welcome.

Best regards,
Jorge

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Tom_Wilson · June 30, 2006, 8:37am

Sebastian Beschke wrote:

Hi,

I’m not really an expert in that field, but I’d be guessing that this
method, in whatever way you implement it, would be to slow (switching
active textures takes time AFAIK) to justify implementing that
optimization. Better waste a couple of pixels of texture memory (judging
from your example, it also doesn’t seem as if your textures are huge)
and save lots of development time, is my opinion.

Sebastian

Good point

Also, you would probably need to store more texture co-ordinates for
each object to cope with this. That will take up some memory, and by
the time you have finished you may as well have left the pixels in.

Plus you can sometimes get tearing effects where edges of textures don’t
exactly line up.

Tom> jorgefm at cirsa.com wrote:

Hello all,

First of all, it’s a bit little of topic.

I was wondering if anybody knows a method to decomposing a surface in
the next way. The idea is to map this surface rects in openGL textures
and avoid wasting texture memory from the transparent pixels.

First, I want to discard the transparent pixels from the surface. For
the visible ones, I want to group them in rectangles, the biggest
available. The next figure try to show this:

…
…wwww…
.yyyxxxxzzz.
.yyyxxxxzzz.
.yyyxxxxzzz.
…vvvv…
…

The ‘.’ are transparent pixels, the different letters show the 5
decomposing rectangles. The different rects maps to different size
textures (256x256 for the ‘x’, 128x128 for the ‘y’ and ‘z’, and
128x32 for the ‘w’ and ‘v’ rects, for instance).

The ‘w’ and ‘v’ rects are not allowed to be in ‘x’ rect to avoid the
texture junction problem on rotate texs. This problem arise if there
are vertexs from one rectangle not overlapping the vertexs of the
next rect.
     0--------0
     |        |
0----0        0----0
|    |        |    |
|    |        |    |
|    |        |    |
0----0        0----0
     |        |
     0--------0
The solution is providing more rects, the ‘w’ and ‘v’.
     0--------0
     |        |
0----0--------0----0
|    |        |    |
|    |        |    |
|    |        |    |
0----0--------0----0
     |        |
     0--------0
Then, anybody knows some algorithm to solve this decomposing
in a, more or less, efficient way ? Any comment is welcome.

Best regards,
Jorge

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl
SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

–
Tom Wilson (Lead Programmer)
Hazid Technologies Ltd. | Suite 21, SGCS Business Park,
Tel/Fax: +44-(0)115-922 4115 | Technology Drive, Beeston, Nottingham.
URL: http://www.hazid.com/ | NG9 2ND. United Kingdom.

Jorge_Fernandez_Mont · June 30, 2006, 9:02am

First of all, thanks for all yours comments!

Sebastian Beschke wrote:

Hi,

I’m not really an expert in that field, but I’d be guessing that this
method, in whatever way you implement it, would be to slow (switching
active textures takes time AFAIK) to justify implementing that
optimization. Better waste a couple of pixels of texture memory
(judging
from your example, it also doesn’t seem as if your textures are huge)
and save lots of development time, is my opinion.

Sebastian

Good point

Also, you would probably need to store more texture co-ordinates for
each object to cope with this. That will take up some memory, and by
the time you have finished you may as well have left the pixels in.

Plus you can sometimes get tearing effects where edges of textures don’t
exactly line up.

Tom

Well, I’m not trying to get to the pixel level, but to the minimum texture
size, maybe 64x64 blocks, where the texture switching time is not a
drawback, always a cost is assumed.

Maybe I’ve not explain quite well the problem. I need some fullscreen
animations
(1024x768x16bpp) where every frame have a lots of transparent pixels,
imaging
a transition between to game screens. Every frame is a 1024x768 png. Then a
pre-process is worth, to try to fit all the animation in memory avoiding
accesing
disk in playing time.

Links to texture memory cache systems, or memory caches in general, are
welcome.

Best regards,
Jorge

Peter_Stuetz · July 3, 2006, 2:09pm

Ren? <renesd gmail.com> writes:

Sorry for replying so late. I took me a while to check all the possibilities
that were proposed.

It looks like you are using hardware acceleration with SDL. If X
supports it, this is a really quick way to blit pixels.

I Think, also OpenGL uses hardware accelaration. It should be commonly set up
for all applications. “glxinfo” and “Yast” shows it. Anything else that I could
check ?

With SDL you are creating a 16 bit window.

What are you using with opengl?

How can I check this ?

Have you tried using GL_RGB instead of GL_BGR(it might be doing the
conversion in a slow software routine)?

Changing GL_RGB and GL_BGR results in changed colors (as to be expected)

Have you tried using faster
filtering methods?

No, not yet. I guess I?m not much of a C-guru, to claim that I would be able to
implement a “faster” byte-swapping alorithm tha the on that is used internally.

Some paths in opengl are slow on different cards/drivers. So you
might be hitting a slow path somewhere. Try tweaking a few things and
see if it improves things.

What do You mean by path. I allready tried both texturing ang bitmap-drawing

Is vsync disabled for SDL, and not for opengl?

OpenGl surely uses double-buffering. I see no reason to disable it.

Have you tried using overlays? See $QTDIR/examples/opengl/overlay for
the source code.

The stuff I want to draw on top of the video image is like A Hud-symbology. I
changes with every frame. So I same no reason to bring this to an Overlay
buffer.

Peter_Stuetz · July 14, 2006, 6:09pm

Ryan C. Gordon <icculus icculus.org> writes:

#define TEXTURE_WIDTH 1024
#define TEXTURE_HEIGHT 1024
[…]
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, TEXTURE_WIDTH, TEXTURE_HEIGHT,
0, GL_BGR, GL_UNSIGNED_BYTE, texdata);

Is it possible to read in a smaller size and scale it up with GL_LINEAR
filtering? It might be faster and look about the same, since your
performance hit is all about the time it takes to move texture data to
the video chip and not about what it does with it afterwards…the Intel
852 has hardware trilinear filtering anyhow.

So this is the answer to Ryans posting.
Again sorry for answering so late to your extensive list of proposals.

Well the company providing the frame grabber is just under way to come up with
a driver version able to provide lower resolved images. (It is still a beta-
version). For the moment being it is PAL-resolution (768x568). So what I did is
reducing the image sice by my own using Qimage methods. The results was that it
more puts down on the image quality as it speeds up the draeing process (Unless
I go to rediculess image sizes)

Does the framegrabber
actually give you anywhere near 1024x1024?

This is just for the inital definition of the texture. Subsequent image upload
is done via glsubimage… So only 768x568 pixels should be going down the pipe
(I an kind of suspicious by now too. See above)

// paint() is basically called by a Tt-timmer set up with 20 ms

Ugh, don’t do that. You’re probably not hitting this every 20ms. Better
to loop and draw when you can, or when you want. Definitely don’t do
this if grab_video() is going to block anyhow. Just do an infinite loop
that draws and dispatches Qt events.

I tried a couple of combinations. As the grabber function is still blocking i
found myself, threading the whole grabbing part out having now a very simple
draw loop which receives image data via global variables. But still no
performance raise.

Worry about CPU usage last.

// grab_video() blacks until a new image is received

Is the 3D thing over the picture doing any animation? If so, don’t block
here (or render on a 20ms timer!), which will help your framerate…and
only do a new texture upload when you actually got a read from the
framegrabber. If the 3D stuff isn’t animated, consider whether you
really need OpenGL here (might be faster to use the 2D facilities, and
blit a static, prerendered image over the video frames).

Well, unfortunatly it has to animated. It changes with every frame. I consits
of a 3D-tunnel to be drawn in a conformal way with the video-piture showing the
landscape provided by a videocamera on a vehicle

The specsheet for that video chip claims it can use YUV images as
texture data, but I don’t know if that’s exposed anywhere in OpenGL.
Chances are the framegrabber can do YUV data faster.
                   glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, video->width

(), video->height(),

          GL_BGR, GL_UNSIGNED_BYTE, video->currentFrame());
Try GL_RGB, and if that doesn’t help, see if you can get the
framegrabber to feed you data with an always-full alpha channel…the
Intel specsheet says it takes 16-bit and 32-bit data (and suggests that
32-bit mode is RGBA, not BGRA), so there’s probably a software
conversion from 24 to 32 bit going on in the driver here (and worse, it
might be doing two conversions: BGR->RGB and then RGB->RGBA). I’m not
absolutely certain which channel order the hardware favors internally,
though.

The specsheet says it also supports 16-bit data at the hardware level,
so it might actually be cheaper to convert the data by hand before
uploading it, if the performance hit is in pushing it over the bus. If
the camera is crappy, you might not have any noticable color loss by
dropping to 16-bit.

It fiddled aroung a lot here. the only thing I could convince the framegrabber
was to give my a R5G6R5 picture (16 bit) which also OpenGL promised me to be
able to convert to texture. Worked, but same bad performance.

By any chance does your GL implementation have the
GL_ARB_texture_non_power_of_two extension? You don’t have to feed it
empty padding data in that case.
                   glEnd();
I’m not sure it matters for one polygon, but you might want to try not
using immediate mode for a speed boost, too.

immediate mode ???

Other options: only upload every other frame. Only upload half of the
frame every other time, so you get most of the likely relevant data
without much loss.

Mostly: use a smaller texture…1024x1024 is pretty damned big for what
you’re doing here. And don’t block, either in Qt or the framegrabber.

As to why SDL is faster: it’s probably using a YUV overlay instead of
OpenGL. Much faster (since the facility basically exists for this very
purpose), but it sort of precludes the 3D bits you want, and it’s
probably not on a 20ms timer.

Could You please confirm:

One can not draw 3D with OpenGL on SDL blitted images.
There is a example on NeHe OpenGL-Page (http://nehe.gamedev.net)
(http://nehe.gamedev.net/counter.asp?file=files/basecode/nehegl_sdl.zip)
that does combine drawing with OpenGL and SDL.

-Combining OpenGL and SDL is only done for portability issues, easier I/O-put
handling, Image loading but in combination drawing is only done with OpenGL

–ryan.

Again thanks a lot for Your help