# Fastest way to 'expand' a scanline in software

Anyone know of a fast algorithm for expanding a scanline horizontally?
Assume I have a scanline that is 160 pixels wide, and consists of color
indices (into a color table).

What would be the fastest way to scale this by (say) 4? My idea was to
simply look at each pixel and copy it into a new scanline in 4
consecutive places.

Is there a faster way to do this?

Steve

Anyone know of a fast algorithm for expanding a scanline horizontally?

Assume I have a scanline that is 160 pixels wide, and consists of
color
indices (into a color table).

What would be the fastest way to scale this by (say) 4? My idea was
to
simply look at each pixel and copy it into a new scanline in 4
consecutive places.

In this particular case, make a new lookup table for 32 bit values with
the index shifted in all the places and “see” the destination scanline
as int*. I can’t think of a faster way, except not doing it; this might
be possible or not, depending on exactly what are you trying to do.

Lic. Gabriel Gambetta
ARTech - GeneXus Development Team
ggambett at artech.com.uy

Anyone know of a fast algorithm for expanding a scanline
horizontally?

In this particular case, make a new lookup table for 32 bit values with
the index shifted in all the places and “see” the destination scanline
as int*. I can’t think of a faster way, except not doing it; this might
be possible or not, depending on exactly what are you trying to do.

I’m not sure that’s what I need, but then I’m not sure I explained it
correctly. The color table indexes 32-bit values, but the engine itself
is 8-bit. The 32-bit values are converted to the screen bpp at program
start.

Say the following is a subsection of the original scanline (from index 0
to 3), with a ‘|’ indicating a new memory location:

|7|13|67|34|

What I need is to copy this into a framebuffer as follows:

|7|7|7|7|13|13|13|13|67|67|67|67|34|34|34|34|

This will scale the scanline by 4 horizontally. To scale by 4 vertically,
I just repeat that scanline 4 times.

SteveOn Monday 13 October 2003 03:40 pm, Gabriel Gambetta wrote:

you might be able to use rle encoding somehow.

rle = run length encoding.

in pcx files this is how they do compression, if there are 50 black pixels,
instead of putting all 50 black pixels in, it essentiay just says “black
pixel * 50” (in binary of course).

SDL supports rle encoded surfaces so you might be able to make some kinda
hack or check out the rle encoding surfaces code and maybe come up with some
good algorithm yourself.> ----- Original Message -----

To:
Sent: Monday, October 13, 2003 11:34 AM
Subject: Re: [SDL] Fastest way to ‘expand’ a scanline in software

On Monday 13 October 2003 03:40 pm, Gabriel Gambetta wrote:

Anyone know of a fast algorithm for expanding a scanline
horizontally?

In this particular case, make a new lookup table for 32 bit values with
the index shifted in all the places and “see” the destination scanline
as int*. I can’t think of a faster way, except not doing it; this might
be possible or not, depending on exactly what are you trying to do.

I’m not sure that’s what I need, but then I’m not sure I explained it
correctly. The color table indexes 32-bit values, but the engine itself
is 8-bit. The 32-bit values are converted to the screen bpp at program
start.

Say the following is a subsection of the original scanline (from index 0
to 3), with a ‘|’ indicating a new memory location:

|7|13|67|34|

What I need is to copy this into a framebuffer as follows:

|7|7|7|7|13|13|13|13|67|67|67|67|34|34|34|34|

This will scale the scanline by 4 horizontally. To scale by 4 vertically,
I just repeat that scanline 4 times.

Steve

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Say the following is a subsection of the original scanline (from index
0
to 3), with a ‘|’ indicating a new memory location:

|7|13|67|34|

What I need is to copy this into a framebuffer as follows:

|7|7|7|7|13|13|13|13|67|67|67|67|34|34|34|34|

Yes, I did understand. What I said is : first make a lookup table like
this

0 -> 0x00000000
1 -> 0x01010101
2 -> 0x02020202

scanline as an int*. The expanding loop is

while (something)
*dest++ = table[*src++]

Good luck
–Gabriel

Lic. Gabriel Gambetta
ARTech - GeneXus Development Team
ggambett at artech.com.uy> ----- Original Message -----

Sent: Lunes, 13 de Octubre de 2003 03:35 p.m.
To: sdl at libsdl.org
Subject: Re: [SDL] Fastest way to ‘expand’ a scanline in software

On Monday 13 October 2003 03:40 pm, Gabriel Gambetta wrote:

Anyone know of a fast algorithm for expanding a scanline
horizontally?

In this particular case, make a new lookup table for 32 bit values
with the index shifted in all the places and “see” the destination
scanline as int*. I can’t think of a faster way, except not doing it;
this might be possible or not, depending on exactly what are you
trying to do.

I’m not sure that’s what I need, but then I’m not sure I explained it
correctly. The color table indexes 32-bit values, but the engine itself

is 8-bit. The 32-bit values are converted to the screen bpp at program
start.

Say the following is a subsection of the original scanline (from index 0

to 3), with a ‘|’ indicating a new memory location:

|7|13|67|34|

What I need is to copy this into a framebuffer as follows:

|7|7|7|7|13|13|13|13|67|67|67|67|34|34|34|34|

This will scale the scanline by 4 horizontally. To scale by 4
vertically,
I just repeat that scanline 4 times.

Steve

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

to 3), with a ‘|’ indicating a new memory location:
|7|13|67|34|

What I need is to copy this into a framebuffer as follows:
|7|7|7|7|13|13|13|13|67|67|67|67|34|34|34|34|

Yes, I did understand. What I said is : first make a lookup table like
this

0 -> 0x00000000
1 -> 0x01010101
2 -> 0x02020202

scanline as an int*. The expanding loop is

while (something)
*dest++ = table[*src++]

Sorry I didn’t understand; that’s pretty clever. I’ll give it a try. In
combination with dirty-rectangle updates, it can be almost as fast as
before.

Thanks,
SteveOn Monday 13 October 2003 04:48 pm, Gabriel Gambetta wrote:

— Gabriel Gambetta wrote: > > Say the
following is a subsection of the original scanline (from

index
0

to 3), with a ‘|’ indicating a new memory location:

|7|13|67|34|

What I need is to copy this into a framebuffer as follows:

|7|7|7|7|13|13|13|13|67|67|67|67|34|34|34|34|

Yes, I did understand. What I said is : first make a lookup table
like
this

0 -> 0x00000000
1 -> 0x01010101
2 -> 0x02020202

scanline as an int*. The expanding loop is

while (something)
*dest++ = table[*src++]

If you really need to speed this up even more (Remember, premature
optimization is the root of all evil) then look up Duff’s device. Be
warned, it’s ugly

OF course, the usual caveat apples - measure how long it takes before
and after “optimization” to make sure it really is one :)> Good luck

–Gabriel

Lic. Gabriel Gambetta
ARTech - GeneXus Development Team
ggambett at artech.com.uy

-----Original Message-----
From: Stephen Anthony [mailto:stephena at roadrunner.nf.net]
Sent: Lunes, 13 de Octubre de 2003 03:35 p.m.
To: sdl at libsdl.org
Subject: Re: [SDL] Fastest way to ‘expand’ a scanline in software

On Monday 13 October 2003 03:40 pm, Gabriel Gambetta wrote:

Anyone know of a fast algorithm for expanding a scanline
horizontally?

In this particular case, make a new lookup table for 32 bit values
with the index shifted in all the places and “see” the destination
scanline as int*. I can’t think of a faster way, except not doing
it;
this might be possible or not, depending on exactly what are you
trying to do.

I’m not sure that’s what I need, but then I’m not sure I explained it

correctly. The color table indexes 32-bit values, but the engine
itself

is 8-bit. The 32-bit values are converted to the screen bpp at
program
start.

Say the following is a subsection of the original scanline (from
index 0

to 3), with a ‘|’ indicating a new memory location:

|7|13|67|34|

What I need is to copy this into a framebuffer as follows:

|7|7|7|7|13|13|13|13|67|67|67|67|34|34|34|34|

This will scale the scanline by 4 horizontally. To scale by 4
vertically,
I just repeat that scanline 4 times.

Steve

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Want to chat instantly with your online friends? Get the FREE Yahoo!
Messenger http://mail.messenger.yahoo.co.uk

Remember to use unsigned byte or better cast the src pointed value to
unsigned int, because byte might result to use of negative index.On Monday 13 October 2003 22:46, Stephen Anthony wrote:

On Monday 13 October 2003 04:48 pm, Gabriel Gambetta wrote:

to 3), with a ‘|’ indicating a new memory location:
|7|13|67|34|

What I need is to copy this into a framebuffer as follows:
|7|7|7|7|13|13|13|13|67|67|67|67|34|34|34|34|

Yes, I did understand. What I said is : first make a lookup table
like this

0 -> 0x00000000
1 -> 0x01010101
2 -> 0x02020202

scanline as an int*. The expanding loop is

while (something)
*dest++ = table[*src++]

Sorry I didn’t understand; that’s pretty clever. I’ll give it a try.
In combination with dirty-rectangle updates, it can be almost as
fast as before.

What I need is to copy this into a framebuffer as follows:

|7|7|7|7|13|13|13|13|67|67|67|67|34|34|34|34|

This will scale the scanline by 4 horizontally. To scale by 4
vertically,
I just repeat that scanline 4 times.

Your system’s vector unit (Altivec, SSE, etc) can probably beat the hell
out of any generic C algorithm for this particular problem, if you’re
willing to write non-portable, messy code, and your data happens to land
on a 16-byte alignment. Read blocks of 16 bytes into one register,
duplicate them out to more registers (four elements, expanded, per
register), and store them back to RAM. Assuming your program can
accomodate the requirements for vectorization, this particular case will
probably land somewhere between 2 and 20 times faster than anything you
can write in portable C.

A lookup table, as gabriel suggested, can spank your CPU cache…in many
cases, it’s faster to continually recalculate expensive equasions on
modern CPUs than it is to continually reference a lookup table. That
being said, it might be best to read a Uint8, use bitwise operators to
duplicate it across all four bytes of a Uint32, and then write that out
to memory. On an x86, we’re talking a couple of opcodes with no
roundtrip to memory to duplicate a byte across a register.

Remember to profile everything before assuming it is faster, and There
Ain’t No Such Thing As The Fastest Code.

–ryan.

At 08:34 PM 10/13/2003, Stephen Anthony wrote:

Anyone know of a fast algorithm for expanding a scanline
horizontally?

In this particular case, make a new lookup table for 32 bit values with
the index shifted in all the places and “see” the destination scanline
as int*. I can’t think of a faster way, except not doing it; this might
be possible or not, depending on exactly what are you trying to do.

I’m not sure that’s what I need, but then I’m not sure I explained it
correctly. The color table indexes 32-bit values, but the engine itself
is 8-bit. The 32-bit values are converted to the screen bpp at program
start.

I think what he explains does apply to you. Instead of copying the source
value 4 times, the source value is used as an index into an 256 ints
lookup table (Which contains 0x0, 0x01010101, 0x02020202, … 0xFFFFFFFF).
And that int is directly copied into the dest.

Say the following is a subsection of the original scanline (from index 0
to 3), with a ‘|’ indicating a new memory location:

|7|13|67|34|

What I need is to copy this into a framebuffer as follows:

|7|7|7|7|13|13|13|13|67|67|67|67|34|34|34|34|

This will scale the scanline by 4 horizontally. To scale by 4 vertically,
I just repeat that scanline 4 times.

It might be cheaper to scale it by 4 horizontally as late as possible (When
copying to the screen buffer). So when you are scaling by 4 vertically
instead of repeating/copying the scanline do the *4 scaling at that moment
(For example by multiplying the source by 0x01010101 or using the LUT way
explained above, or by simply copying the value to the 4 destination bytes,
whichever method is the fastest). This might be faster because less data
is involved (The source scanline now uses 4 times less memory). Only
profiling will tell.

Regards,
Dimitri>On Monday 13 October 2003 03:40 pm, Gabriel Gambetta wrote: