Single pixel drawing

Christopher_Thielen · July 14, 2002, 6:33pm

What’s the best way to color a single pixel? I tried putpixel() but as
I’ve heard from a lot of people now, it’s slow and shouldn’t be used. I
draw about 100 pixels to an 800x600 display per frame in 16 bpp. What’s
the best way to draw a single pixel? Should I just use SDL_FillRect?
(I’m basically looking for the fastest way to draw a pixel, as an
alternative to putpixel.) Thanks.

– chris (@Christopher_Thielen)

Loren_Osborn · July 14, 2002, 8:57pm

100 pixels per frame shouldn’t be much of a problem…
also, if you aren’t drawing the pixels in any specific
pattern, a putpixel is probably the best you could
do… (You may be able to cache the color-lookup if
you’re drawing alot of the same color pixel.)

-Loren

— Chris Thielen wrote:> What’s the best way to color a single pixel? I tried

putpixel() but as
I’ve heard from a lot of people now, it’s slow and
shouldn’t be used. I
draw about 100 pixels to an 800x600 display per
frame in 16 bpp. What’s
the best way to draw a single pixel? Should I just
use SDL_FillRect?
(I’m basically looking for the fastest way to draw a
pixel, as an
alternative to putpixel.) Thanks.

– chris (chris at luethy.net)

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Do You Yahoo!?
Yahoo! Autos - Get free new car price quotes

atrix2 · July 15, 2002, 1:30am

Drawing single pixels at a time is inherantly slow and if at all possible it
should be avoided. That aside im going to try to explain as best as i can
and hopefully itll make sense to you. The fastest way to draw a pixel is to
write to the memory directly. Each SDL surface has 2 members which you are
going to need to draw a pixel (assuming you know screen height and bits per
pixel which it looks like you do ^.^). The first one is called pixels
(Surface->pixels), this is a pointer to the location in memory that is the
"Screen" itself which you can draw directly to. The other member is called
pitch. On screen, pixels look like theyd be stored in a 2 dimensional
array, but in fact it is one long chunk of memory (which a 2d array also is
but never mind that hehe). The pitch of a surface is how many bytes to add
to your current position on the screen to get to the next line down. For
example, in 800x600 mode in 16 bit color (assuming no padding between lines)
the screen would be one long chunk of memory that was 960,000 bytes long(800
width *600 height 2 bytes per pixel). so the first pixel would be at byte
0, and the second pixel would be at byte 2 and the third would be at byte 4
and so on until you got to the 801st pixel which would be at byte 1602. The
801st pixel also happens to be the pixel defined in 2d coordinates as (0,1)
which is 1 pixel down (: you with me so far? if we wanted pixel (0,2) we
add 1600 (800 pixels wide * 2 bytes per pixel) and get 3201…so see we add
1600 to a pixel to mode down 1 row. In this scenario, the pitxh of the
surface is 1600 since you add 1600 to a pixel to get to the one below it.
Sometimes in certain video modes (especialy windowed video modes), the pitch
of the screen is more than youd think it should be because for some reason,
they pad each row of pixels with extra bytes (i think it has something to do
with aligning bytes for optomized memory access but im not 100% sure). That
means that if your in 800x600 mode in 16 bit color, the picth of the screen
might not be 1600, it could be 1601 or 2050 or even 5000! Luckily for us we
dont need to know the exact number because sdl tells us what the pitch of a
surface is so we have that part covered. To get the location of any pixel
on the screen, if we want to draw to pixel (x,y), we use the formula
"location=(ypitch)+x". With that in mind, heres a small function to draw a
pixel:

void DrawPixel(SDL_Surface *Surface, int x, int y,Uint16 Color)
{
Uint16 *Pixel;
Pixel = (Uint16 )Surface->pixels + ySurface->pitch/2 + x;
*Pixel = color;
}

as you can see we add to Surface->pixels (the begining of the screen) the
following: y*Surface->pitch/2 +x. As you can see its different than the
above equation because it has a pitch/2. Thats because the pitch is given
in bytes and our pointer is moving in 2 byte increments (as we are in 16 bit
color mode). Ok, enough with the technical info. Basicly if you want to
draw single pixels to the screen, try not to draw them one by one using a
generic pixel drawing function. If you can…copy/paste the last 2 lines of
code above into the place (or places) where you want to draw your dots, this
should give it a pretty good boost. The problem with calling generic pixel
drawing functions is that each time you draw the pixel, the CPU pushes all
the variables you pass as arguments onto the stack, then it pushes the
location your program is at then transfers control to the drawing function,
it then takes up a little more stack space for the local pointer (Uint16
*Pixel) and then does the math, moves your color into memory and then pops
all the variables off the stack and returns control to the program where it
was before it tried to draw the pixel. That sounds like a lot of work
doesnt it?? If instead of a function, you had that pixel code directly
inside a for loop, what would happen instead is it would skip all the steps
up to doing the math, then it would move your color into memory and its
done. Lots less work isnt it? I know if you had to do it out by hand youd
want to do as little work as possible, and so do our CPU’s, they just live
in much faster time slices than we do and cant complain about it (;

Well, i hope this has helped some. Good luck with your coding Chris.

-Atrix> ----- Original Message -----

From: chris@luethy.net (Christopher Thielen)
To:
Sent: Sunday, July 14, 2002 6:32 PM
Subject: [SDL] single pixel drawing

What’s the best way to color a single pixel? I tried putpixel() but as
I’ve heard from a lot of people now, it’s slow and shouldn’t be used. I
draw about 100 pixels to an 800x600 display per frame in 16 bpp. What’s
the best way to draw a single pixel? Should I just use SDL_FillRect?
(I’m basically looking for the fastest way to draw a pixel, as an
alternative to putpixel.) Thanks.

– chris (chris at luethy.net)

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Loren_Osborn · July 15, 2002, 1:53am

While I agree that function calling overhead can be
expensive, decalring a function as “inline” largely
eliminates the majority of this.

That leaves two other potential bottelnecks:
pixel addressing (which you discussed). While the
multiplies can be expensive, they can only be replaced
with faster operations (like adds and increments) when
the pixels are drawn in a predictable shape (such as a
curve or a line).

The bottleneck you did not address is converting the
desired pixel color into the pixel format of the
surface. This can be especially redundant if the same
color is being converted repeatedly. (Which is why I
advised caching)…

I hope that helped,

-Loren
— Atrix Wolfe wrote:

Drawing single pixels at a time is inherantly slow
and if at all possible it
should be avoided. That aside im going to try to
explain as best as i can
and hopefully itll make sense to you. The fastest
…[truncated}…__________________________________________________
Do You Yahoo!?
Yahoo! Autos - Get free new car price quotes
http://autos.yahoo.com

Robert_Wohleb · July 16, 2002, 10:30am

One of the best ways to add some optimizations to pixel operations is to use
bit-shifting when doing pointer calculations. Bit-shifting mixed with
addition can add up to a big bonus when you have to do a lot of pointer
calculations per frame.

~Rob> ----- Original Message -----

From: Loren Osborn [mailto:linux_dr@yahoo.com]
Sent: Monday, July 15, 2002 1:53 AM
To: sdl at libsdl.org
Subject: Re: [SDL] single pixel drawing

While I agree that function calling overhead can be
expensive, decalring a function as “inline” largely
eliminates the majority of this.

That leaves two other potential bottelnecks:
pixel addressing (which you discussed). While the
multiplies can be expensive, they can only be replaced
with faster operations (like adds and increments) when
the pixels are drawn in a predictable shape (such as a
curve or a line).

The bottleneck you did not address is converting the
desired pixel color into the pixel format of the
surface. This can be especially redundant if the same
color is being converted repeatedly. (Which is why I
advised caching)…

I hope that helped,

-Loren
— Atrix Wolfe wrote:

Drawing single pixels at a time is inherantly slow
and if at all possible it
should be avoided. That aside im going to try to
explain as best as i can
and hopefully itll make sense to you. The fastest
…[truncated}…

Do You Yahoo!?
Yahoo! Autos - Get free new car price quotes
http://autos.yahoo.com

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Neil_Bradley · July 16, 2002, 10:38am

That leaves two other potential bottelnecks:
pixel addressing (which you discussed). While the
multiplies can be expensive, they can only be replaced

On what platforms that SDL supports are multiplies expensive? Don’t say
x86 or PPC…

–>Neil-------------------------------------------------------------------------------
Neil Bradley What are burger lovers saying
Synthcom Systems, Inc. about the new BK Back Porch Griller?
ICQ #29402898 “It tastes like it came off the back porch.” - Me

Loren_Osborn · July 16, 2002, 10:56am

They are much less expensive than they used to be, but
Multiplies are still more expensive than adds, and you
don’t want to do too many multiplies in an inner loop
if you can help it… (It just so happens that the
reason working with the blitters is a pain, is all the
bit-packing to reduce the number of multiplies and
divides: while conventional divides are slow, the
blitter only uses shifts, which are fast… In the
blitters the bottelneck is the multiplies!)

-Loren

— Neil Bradley wrote:> > That leaves two other potential bottelnecks:

pixel addressing (which you discussed). While the
multiplies can be expensive, they can only be
replaced

On what platforms that SDL supports are multiplies
expensive? Don’t say
x86 or PPC…

–>Neil

Neil Bradley What are burger lovers
saying
Synthcom Systems, Inc. about the new BK Back Porch
Griller?
ICQ #29402898 “It tastes like it came off
the back porch.” - Me

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Do You Yahoo!?
Yahoo! Autos - Get free new car price quotes

Neil_Bradley · July 16, 2002, 11:08am

They are much less expensive than they used to be, but
Multiplies are still more expensive than adds, and you

How so? At least on the Pentium, Multiplies are 1 clock or less, just like
shifts.

Don’t get me wrong - In practice I agree with you and do shifts instead of
multiplies, but I really am curious if the multiplies on any other
processors are really slower than shifts.

–>Neil-------------------------------------------------------------------------------
Neil Bradley What are burger lovers saying
Synthcom Systems, Inc. about the new BK Back Porch Griller?
ICQ #29402898 “It tastes like it came off the back porch.” - Me

Jacek_Wojdel · July 17, 2002, 2:51am

But since Pentium Pro you can issue two instructions per cycle. Those can be
2 adds, 1 add and 1 shift etc. You cannot however issue more than one
multiplication and it blocks both ALUs. Moreover shifts and adds have
latency of 1 cycle, while multiplication has latency of 4 cycles. In total,
if you have shift+add it will execute in between 2 and 4 cycles, while
mul+add will execute in between 5 and 7 cycles. That’s twice the
performance.
Cheers,
JacekOn Tue, Jul 16, 2002 at 11:18:36AM -0700, Neil Bradley wrote:

They are much less expensive than they used to be, but
Multiplies are still more expensive than adds, and you

How so? At least on the Pentium, Multiplies are 1 clock or less, just like
shifts.

–
±------------------------------------+
|from: J.C.Wojdel |
| J.C.Wojdel at cs.tudelft.nl |
±------------------------------------+

atrix2 · July 17, 2002, 11:40am

im glad people still pay attention to these things (:> ----- Original Message -----

From: wojdel@kbs.twi.tudelft.nl (Jacek Wojdel)
To:
Sent: Wednesday, July 17, 2002 2:50 AM
Subject: Re: [SDL] single pixel drawing

On Tue, Jul 16, 2002 at 11:18:36AM -0700, Neil Bradley wrote:

They are much less expensive than they used to be, but
Multiplies are still more expensive than adds, and you

How so? At least on the Pentium, Multiplies are 1 clock or less, just
like

shifts.

But since Pentium Pro you can issue two instructions per cycle. Those can
be
2 adds, 1 add and 1 shift etc. You cannot however issue more than one
multiplication and it blocks both ALUs. Moreover shifts and adds have
latency of 1 cycle, while multiplication has latency of 4 cycles. In
total,
if you have shift+add it will execute in between 2 and 4 cycles, while
mul+add will execute in between 5 and 7 cycles. That’s twice the
performance.
Cheers,
Jacek

±------------------------------------+
|from: J.C.Wojdel |
| J.C.Wojdel at cs.tudelft.nl |
±------------------------------------+

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

atrix2 · July 17, 2002, 12:36pm

ok so shifting seems optimal to multiplication. How would you take something like y*pitch+x and figure out what shifting to do for a pitch that is determined at run-time?

Loren_Osborn · July 17, 2002, 1:12pm

You can ONLY use a shift instead of a multiply IF
pitch is a power of 2… You could them determin the
conversion of pitch to a shift before drawing a large
batch of pixels… the problem is them you would need
to BRANCH depending on whether to shift or
multiply… Unfortunately a conditional branch is more
expensive than the original multiply!

What you can do instead is find your initial pixel,
then move incrementally relative to it… For example,
to draw a line, you are generally moving from one
pixel in one of 8 directions to an adjecent pixel. to
move up a row, subtract pitch. to move over (left or
right) ad or subtract sizeof(pixel), and to move down
add pitch… so now you’re adding/subtracting instead
or multiplying…Understand?

-Loren
— Atrix Wolfe wrote:> ok so shifting seems optimal to multiplication. How

would you take something like y*pitch+x and figure
out what shifting to do for a pitch that is
determined at run-time?

Do You Yahoo!?
Yahoo! Autos - Get free new car price quotes

Neil_Bradley · July 17, 2002, 2:02pm

How so? At least on the Pentium, Multiplies are 1 clock or less, just like
shifts.
But since Pentium Pro you can issue two instructions per cycle. Those can be
2 adds, 1 add and 1 shift etc. You cannot however issue more than one
multiplication and it blocks both ALUs. Moreover shifts and adds have
latency of 1 cycle, while multiplication has latency of 4 cycles. In total,
if you have shift+add it will execute in between 2 and 4 cycles, while
mul+add will execute in between 5 and 7 cycles. That’s twice the
performance.

I can’t find any references to anything stating that a multiply is any
more than 1 instruction on a PII/PIII/P4. Not only that, you haven’t
considered that you’ll stall one of the pipelines if the add relies on the
multiply (which it will in this case).

FWIW, I compiled my emulator that does lots of raster graphics, removed
all throttling, and changed shifts to multiplies. It made zero difference
(and yes, I shut off optimization and did a disassembly of the code to
ensure it wasn’t turning my multiplies into shifts) between multiplies and
shifts. This is on a 1Ghz PIII, and the graphics processing takes 60% of
overall execution time.

There are more factors than just the instructions themselves. Depending
upon the shift, it may require reloading the cl register which can do who
knows what to the optimization capabilities of the compiler. At this point
it’s a wash.

–>Neil-------------------------------------------------------------------------------
Neil Bradley What are burger lovers saying
Synthcom Systems, Inc. about the new BK Back Porch Griller?
ICQ #29402898 “It tastes like it came off the back porch.” - Me

Neil_Bradley · July 17, 2002, 2:18pm

What you can do instead is find your initial pixel,
then move incrementally relative to it… For example,
to draw a line, you are generally moving from one
pixel in one of 8 directions to an adjecent pixel. to
move up a row, subtract pitch. to move over (left or
right) ad or subtract sizeof(pixel), and to move down
add pitch… so now you’re adding/subtracting instead
or multiplying…Understand?

Definitely good advice.

Another bit of advice I’d like to give people is to not use a generic,
single function to put single pixels of all color depths (or all
endianness for that matter), especially if lots of individual dots are
being drawn.

Instead, create a function pointer, and set whatever pixel
depth/endianness your given target uses :

static void (*PutPixel)(UINT32 u32X, UINT32 u32Y, UINT32 u32Pixel) = NULL;

…
switch (u8Depth)
{
case 8:
PutPixel = PutPixel8bpp;
break;
case 16:
if (bBigEndian)
PutPixel = PutPixel16bppBigEndian;
else
PutPixel = PutPixel16bppLittleEndian;
break;
case 32:
if (bBigEndian)
PutPixel = PutPixel32bppBigEndian;
else
PutPixel = PutPixel32bppLittleEndian;
break;

…

And in the code:

	PutPixel(x, y, 0x34); // Or whatever...

It seems optimization is a lost art. I really wish software developers had
to go through the equivalent of boot camp where they had almost no memory
nor code space to accomplish tasks to give them some respect for
resources.

–>Neil-------------------------------------------------------------------------------
Neil Bradley What are burger lovers saying
Synthcom Systems, Inc. about the new BK Back Porch Griller?
ICQ #29402898 “It tastes like it came off the back porch.” - Me

Jacek_Wojdel · July 18, 2002, 3:20am

I can’t find any references to anything stating that a multiply is any
more than 1 instruction on a PII/PIII/P4. Not only that, you haven’t
considered that you’ll stall one of the pipelines if the add relies on the
multiply (which it will in this case).

Well, check out

First go to page 22 (2-15) ad lookup the table 2-3. It clearly states that
multiplication has latency of 4 cycles while adds and shifts have only 1
cycle latency.
Secondly, check the page 47 (3-23) for information on which instructions can
be paired (thus executed in paralel).
Those are the only two bits of information that I based my message on. I
don’t refer here to any specific code example. You’re right that in case of
y*w+x, the addition must wait for the mul to finish, just as it needs to
wait in case of y<<p+x. However, in a tight and unrolled loop, the compiler
might optimize it in such a way that the shift will take place at the same
time as the addition from previous iteration.
Of course it does not make sense to design a generic approach to substitute
each mul with a lots of branching, shifts and additions. If you do however
have a previous knowledge about your code, and you know that a given mul
will always be by power of 2, just use it (even though nonimmediate shift is
not pairable just as mul, it has lower latency). Shifts are cheaper by
definition and this way you give your compiler a chance to optimize things
better (even if current compiler does provide any difference, the future one
might).

FWIW, I compiled my emulator that does lots of raster graphics, removed
all throttling, and changed shifts to multiplies. It made zero difference
(and yes, I shut off optimization and did a disassembly of the code to
ensure it wasn’t turning my multiplies into shifts) between multiplies and
shifts. This is on a 1Ghz PIII, and the graphics processing takes 60% of
overall execution time.

I agree with your results, especially as you removed optimizations. That
means in neither case did compiler try to take advantage of OoO execution,
pairing of instructions etc. For such a code there really is no difference
between shift and mul.
It is just a matter of providing the optimization routines with the
knowledge about your code that only you as a programmer possess. Writing:
x<<p
You say: "this is an operation that can be done by shifting value of x"
If you write:
x*n
You say: "this is a multiplication of x by any possible number p"
If however in your code n would always be a power of 2 (it is let’s say the
GL texture size), you essentially withold this information from your
compiler. It may not make a difference on your machine, with your compiler
and on your operating system. If however sometime later somebody decides to
compile your code on some embeded system that must put a systemwide trap on
each multiplication (don’t ask me why, just imagine), this small piece of
information may provide to be crucial for the speed of the code.

There are more factors than just the instructions themselves. Depending
upon the shift, it may require reloading the cl register which can do who
knows what to the optimization capabilities of the compiler. At this point
it’s a wash.

And isn’t reloading a register a simple matter of renaming now ? I thought
it is since P4. So different part of the code occupying the pipeline may
even see different versions of CL and be happy with it. But I agree with
you saying that there is no way of telling what would optimizations do with
your code. That means for me that you should program things in such a way
that the optimizer will have the most available knowledge about your code.
If it thinks that flushing CL is a bad idea, it can always express any shift
with appropriate multiplication (I know, it’s additional xor, and setbit,
but you suggested a disaster because of grabbing CL, so it will be worth
it). It is not possible to efficiently change mul into shift without external
knowledge.
And anyway, we’re way out of topic here
Cheers,
JacekOn Wed, Jul 17, 2002 at 02:12:13PM -0700, Neil Bradley wrote:

–
±------------------------------------+
|from: J.C.Wojdel |
| J.C.Wojdel at cs.tudelft.nl |
±------------------------------------+

Robert_Wohleb · July 18, 2002, 10:34am

If you have a multiplier that never changes, it can be optimized with
shifting. Even it isn’t a power of two. Shift to the number that’s closest,
but less, than the target and add the difference. This will really help with
hard coded multipliers.

As with any optimization attempt, it’s hard to determine the benefit. Just
looking at the program run won’t help. Your going to need to time execution
of various code segments on as detailed level as possible. A few
microseconds here and a few more gained there can add a few more frames per
second. It isn’t necessarily about huge benefits. The trick to optimizing
code is really guessing which optimized code will give the best benefit
overall. It also helps to use #ifdef when writing optimization routines so
that it is easy to compare code speeds.

~Rob> ----- Original Message -----

From: Jacek Wojdel [mailto:wojdel@kbs.twi.tudelft.nl]
Sent: Thursday, July 18, 2002 3:20 AM
To: sdl at libsdl.org
Subject: Re: [SDL] single pixel drawing

On Wed, Jul 17, 2002 at 02:12:13PM -0700, Neil Bradley wrote:

I can’t find any references to anything stating that a multiply is any
more than 1 instruction on a PII/PIII/P4. Not only that, you haven’t
considered that you’ll stall one of the pipelines if the add relies on the
multiply (which it will in this case).

Well, check out
http://developer.intel.com/design/PentiumII/manuals/242816.htm
First go to page 22 (2-15) ad lookup the table 2-3. It clearly states that
multiplication has latency of 4 cycles while adds and shifts have only 1
cycle latency.
Secondly, check the page 47 (3-23) for information on which instructions can
be paired (thus executed in paralel).
Those are the only two bits of information that I based my message on. I
don’t refer here to any specific code example. You’re right that in case of
y*w+x, the addition must wait for the mul to finish, just as it needs to
wait in case of y<<p+x. However, in a tight and unrolled loop, the compiler
might optimize it in such a way that the shift will take place at the same
time as the addition from previous iteration.
Of course it does not make sense to design a generic approach to substitute
each mul with a lots of branching, shifts and additions. If you do however
have a previous knowledge about your code, and you know that a given mul
will always be by power of 2, just use it (even though nonimmediate shift is
not pairable just as mul, it has lower latency). Shifts are cheaper by
definition and this way you give your compiler a chance to optimize things
better (even if current compiler does provide any difference, the future one
might).

FWIW, I compiled my emulator that does lots of raster graphics, removed
all throttling, and changed shifts to multiplies. It made zero difference
(and yes, I shut off optimization and did a disassembly of the code to
ensure it wasn’t turning my multiplies into shifts) between multiplies and
shifts. This is on a 1Ghz PIII, and the graphics processing takes 60% of
overall execution time.

I agree with your results, especially as you removed optimizations. That
means in neither case did compiler try to take advantage of OoO execution,
pairing of instructions etc. For such a code there really is no difference
between shift and mul.
It is just a matter of providing the optimization routines with the
knowledge about your code that only you as a programmer possess. Writing:
x<<p
You say: "this is an operation that can be done by shifting value of x"
If you write:
x*n
You say: "this is a multiplication of x by any possible number p"
If however in your code n would always be a power of 2 (it is let’s say the
GL texture size), you essentially withold this information from your
compiler. It may not make a difference on your machine, with your compiler
and on your operating system. If however sometime later somebody decides to
compile your code on some embeded system that must put a systemwide trap on
each multiplication (don’t ask me why, just imagine), this small piece of
information may provide to be crucial for the speed of the code.

There are more factors than just the instructions themselves. Depending
upon the shift, it may require reloading the cl register which can do who
knows what to the optimization capabilities of the compiler. At this point
it’s a wash.

And isn’t reloading a register a simple matter of renaming now ? I thought
it is since P4. So different part of the code occupying the pipeline may
even see different versions of CL and be happy with it. But I agree with
you saying that there is no way of telling what would optimizations do with
your code. That means for me that you should program things in such a way
that the optimizer will have the most available knowledge about your code.
If it thinks that flushing CL is a bad idea, it can always express any shift
with appropriate multiplication (I know, it’s additional xor, and setbit,
but you suggested a disaster because of grabbing CL, so it will be worth
it). It is not possible to efficiently change mul into shift without
external
knowledge.
And anyway, we’re way out of topic here
Cheers,
Jacek

–
±------------------------------------+
|from: J.C.Wojdel |
| J.C.Wojdel at cs.tudelft.nl |
±------------------------------------+

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

atrix2 · July 18, 2002, 3:08pm

how bout at run time you factor the pitch into powers of 2 and then create your full pixel drawing function (if there are 100 dots drawn put the for loop in and such) and “compile” it to the native machine code. Hows that for efficiency?

Bob_Pendleton · July 18, 2002, 3:41pm

Atrix Wolfe wrote:

how bout at run time you factor the pitch into powers of 2 and then
create your full pixel drawing function (if there are 100 dots drawn put
the for loop in and such) and “compile” it to the native machine code.
Hows that for efficiency?

That pretty good, we used to do things like that back in the days of 10
megahertz 286s because we needed to. In these days where a stinking slow
computer runs at 100 megahertz and a reasonable new computer runs at 800
to 2200 megahertz, it seems like a waste of life span to even worry
about this for an application that is writing 100 pixels. Lets see, if
we can do (s*y)+x using 1 shift and 1 add I can do it in 2 cycles and I
spend 200 cycles on arithmetic and if I need 2 shifts and and 2 adds I
spend 400 cycles, and if it take more shifts and adds than that then I
should use a multiply becasue a mulitply and and add are 5 cycles (all
this cycle information is from previous posts and seems to be based on
the original Pentium which you can buy at thrift stores for a couple of
bucks…) Anyway, the difference is at most 300 cycles per frame.

So, at 1000 megahertz 300 cycles is 0.0000003 seconds. That’s how much
you might save wow, remind me to waste hours arguing about how to save
0.0000003 seconds again. Life is too short to be worrying about
0.0000003 seconds. Get some perspective people.

	Bob Pendleton--

±-----------------------------------------+

Bob Pendleton, an experienced C/C++/Java +
UNIX/Linux programmer, researcher, and +
system architect, is seeking full time, +
consulting, or contract employment. +
Resume: http://www.jump.net/~bobp +
Email: @Bob_Pendleton +
±-----------------------------------------+

Neil_Bradley · July 18, 2002, 3:54pm

So, at 1000 megahertz 300 cycles is 0.0000003 seconds. That’s how much
you might save wow, remind me to waste hours arguing about how to save
0.0000003 seconds again. Life is too short to be worrying about
0.0000003 seconds. Get some perspective people.

In this specific case I’d agree with you (but your numbers are off by
quite a bit since the multiply/shift isn’t the only thing going on), but
the general attitude of people is to forget about optimization of all
forms because they are too lazy to do otherwise. The amount of time I
spend in the embedded world having to undo someone else’s lousy designs
are too numerous to mention. I’m just happy that it’s even being
discussed, because that means people are thinking about it. Having CPUs as
fast as we do now have just made people who couldn’t normally be
programmers become “programmers”, and we wind up with taking computing
back 10 years with the likes of interpreted Java and Visual Basic. This is
why OSes take eons to load up. This is why apps are big, bloated pieces of
garbage. No one needs to conserve.

It takes negligible extra time to be smart about programming, and the
attitude portrayed above is just one of many that contribute to the horrid
computing experiences on all platforms.

That being said, I’m quite happy with SDL and its performance. I switched
from using DirectX to using SDL, and I’m happy to say the performance
"degradation" was almost immeasurable. Not an easy feat for a cross
platform library!

–>Neil-------------------------------------------------------------------------------
Neil Bradley What are burger lovers saying
Synthcom Systems, Inc. about the new BK Back Porch Griller?
ICQ #29402898 “It tastes like it came off the back porch.” - Me

atrix2 · July 18, 2002, 4:17pm

yeah i just like looking at this from a scientific angle, seeing what is the
most efficient way for execution, even if it isnt the most efficient for
developement. Its kinda neat the depth you can go in just to do something as
simple as drawing a pixel.> ----- Original Message -----

From: bob@pendleton.com (Bob Pendleton)
To:
Sent: Thursday, July 18, 2002 3:43 PM
Subject: Re: [SDL] single pixel drawing

Atrix Wolfe wrote:

how bout at run time you factor the pitch into powers of 2 and then
create your full pixel drawing function (if there are 100 dots drawn put
the for loop in and such) and “compile” it to the native machine code.
Hows that for efficiency?

That pretty good, we used to do things like that back in the days of 10
megahertz 286s because we needed to. In these days where a stinking slow
computer runs at 100 megahertz and a reasonable new computer runs at 800
to 2200 megahertz, it seems like a waste of life span to even worry
about this for an application that is writing 100 pixels. Lets see, if
we can do (s*y)+x using 1 shift and 1 add I can do it in 2 cycles and I
spend 200 cycles on arithmetic and if I need 2 shifts and and 2 adds I
spend 400 cycles, and if it take more shifts and adds than that then I
should use a multiply becasue a mulitply and and add are 5 cycles (all
this cycle information is from previous posts and seems to be based on
the original Pentium which you can buy at thrift stores for a couple of
bucks…) Anyway, the difference is at most 300 cycles per frame.

So, at 1000 megahertz 300 cycles is 0.0000003 seconds. That’s how much
you might save wow, remind me to waste hours arguing about how to save
0.0000003 seconds again. Life is too short to be worrying about
0.0000003 seconds. Get some perspective people.

Bob Pendleton

–
±-----------------------------------------+

Bob Pendleton, an experienced C/C++/Java +

UNIX/Linux programmer, researcher, and +

system architect, is seeking full time, +

consulting, or contract employment. +

Resume: http://www.jump.net/~bobp +

Email: bob at pendleton.com +
±-----------------------------------------+

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl