[slightly OT] Portable float packing/unpacking

Hello.

I use the integer packing/unpacking code from http://cr.yp.to
to portably read and write integers to disk. Typical functions
look like this:

(packs a 32 bit unsigned integer into a 4 element char array
in big-endian byte order)

void uint32_packb(char n[4], uint32 ui)
{
n[3] = ui & 0xff; ui >>= 8;
n[2] = ui & 0xff; ui >>= 8;
n[1] = ui & 0xff;
n[0] = ui >> 8;
}

uint32 is a typedef that’s worked out at compile time. Now, I
was wondering. In the name of portability (as I’m going to
be reading a writing a lot of 3DS files for my SDL/OpenGL
project), assuming I work out an appropriate typedef for
’float32’ at compile time, how can I do the above with
floating point numbers? You can’t use the bitwise shift
operators with floating point numbers in C, for a start.

I basically need to write ‘float32_packl’ (packs a 32 bit
floating point into a 4 byte char array in little endian
byte order).

I would like to avoid byte-swapping macros unless they’re
really unavoidable…

thanks, portability gurus.
a1

I basically need to write ‘float32_packl’ (packs a 32 bit
floating point into a 4 byte char array in little endian
byte order).

Perhaps not the most elegant way of accomplishing this, but this works
fine for any data type as long as the data sizes match:

void float32_packl(char *n, float32 f) {
uint32 ui = (uint32) &f;
uint32_packl(n, ui);
}

// MartinOn Sun, 12 Feb 2006, mal content wrote:

Ah, I didn’t realise it could be that simple. Thanks!

a1On 2/12/06, Martin Storsj? wrote:

On Sun, 12 Feb 2006, mal content wrote:

I basically need to write ‘float32_packl’ (packs a 32 bit
floating point into a 4 byte char array in little endian
byte order).

Perhaps not the most elegant way of accomplishing this, but this works
fine for any data type as long as the data sizes match:

void float32_packl(char *n, float32 f) {
uint32 ui = (uint32) &f;
uint32_packl(n, ui);
}

Perhaps not the most elegant way of accomplishing this, but this works
fine for any data type as long as the data sizes match:

void float32_packl(char *n, float32 f) {
uint32 ui = (uint32) &f;
uint32_packl(n, ui);
}

Ah, I didn’t realise it could be that simple. Thanks!

Be really careful with this. Optimizing compilers will sometimes keep
floating point values in the floating point stack, so (uint32)&f will
be garbage in this case.

-Sam Lantinga, Senior Software Engineer, Blizzard Entertainment

Be really careful with this. Optimizing compilers will sometimes keep
floating point values in the floating point stack, so (uint32)&f will
be garbage in this case.

Unions and the “volatile” keyword are your friends here.

–ryan.

… and so will the compiler be. However the above code
is broken. You must do this:

*(unit32*)(void*)&f

and it may fail if the size and alignment of float32 and
uint32 are not the same.On Sun, 2006-02-12 at 21:09 -0800, Sam Lantinga wrote:

Perhaps not the most elegant way of accomplishing this, but this works
fine for any data type as long as the data sizes match:

void float32_packl(char *n, float32 f) {
uint32 ui = (uint32) &f;
uint32_packl(n, ui);
}

Ah, I didn’t realise it could be that simple. Thanks!

Be really careful with this. Optimizing compilers will sometimes keep
floating point values in the floating point stack, so (uint32)&f will
be garbage in this case.


John Skaller
Felix, successor to C++: http://felix.sf.net

Thanks for all the help so far. The situation is far more fragile than
I had previously thought. Is there ANY safe assignment in C? :slight_smile:

Ok, so, can these functions be made any ‘safer’ or is this as good
as it’s going to get?

void float32_packl(char n[4], float32 f)
{
uint32 ui = *(uint32 *)(void *) &f;
n[0] = ui & 0xff; ui >>= 8;
n[1] = ui & 0xff; ui >>= 8;
n[2] = ui & 0xff;
n[3] = ui >> 8;
}

void float32_packb(char n[4], float32 f)
{
uint32 ui = *(uint32 *)(void *) &f;
n[3] = ui & 0xff; ui >>= 8;
n[2] = ui & 0xff; ui >>= 8;
n[1] = ui & 0xff;
n[0] = ui >> 8;
}

void float32_unpackl(const char n[4], float32 *ui)
{
uint32 t;
float32 t2;
t = (unsigned char)n[3]; t <<= 8;
t += (unsigned char)n[2]; t <<= 8;
t += (unsigned char)n[1]; t <<= 8;
t += (unsigned char)n[0];
t2 = *(float32 *)(void *) &t;
*ui = t2;
}

void float32_unpackb(const char n[4], float32 *ui)
{
uint32 t;
float32 t2;
t = (unsigned char)n[0]; t <<= 8;
t += (unsigned char)n[1]; t <<= 8;
t += (unsigned char)n[2]; t <<= 8;
t += (unsigned char)n[3];
t2 = *(float32 *)(void *) &t;
*ui = t2;
}On 2/13/06, skaller wrote:

On Sun, 2006-02-12 at 21:09 -0800, Sam Lantinga wrote:

Perhaps not the most elegant way of accomplishing this, but this works
fine for any data type as long as the data sizes match:

void float32_packl(char *n, float32 f) {
uint32 ui = (uint32) &f;
uint32_packl(n, ui);
}

Ah, I didn’t realise it could be that simple. Thanks!

Be really careful with this. Optimizing compilers will sometimes keep
floating point values in the floating point stack, so (uint32)&f will
be garbage in this case.

… and so will the compiler be. However the above code
is broken. You must do this:

    *(unit32*)(void*)&f

and it may fail if the size and alignment of float32 and
uint32 are not the same.

Ok, so, can these functions be made any ‘safer’ or is this as good
as it’s going to get?

Argh, gmail broke the indentation, sorry.

a1

Thanks for all the help so far. The situation is far more fragile than
I had previously thought. Is there ANY safe assignment in C? :slight_smile:

First rule: try to avoid casts;
Second rule: try to avoid implicit conversions;
Third rule: try to avoid C;
Fourth rule: goto Third rule;

:slight_smile:

Ok, so, can these functions be made any ‘safer’ or is this as good
as it’s going to get?

void float32_packl(char n[4], float32 f)
{
uint32 ui = *(uint32 *)(void *) &f;
n[0] = ui & 0xff; ui >>= 8;

Both 0xff and 8 are signed integers. This may work,
though I’m not sure – I never remember those rules :slight_smile:

Better to be explicit about it:

0xffu .. 8u

gets the arguments agreeing on sign, if not size.

The other problem is that you have an array of char.
you should use array of unsigned char. C guarantees
unsigned char – and ONLY unsigned char – is a lossless
datatype.

All those times you blitted bytes around using arrays
of char … char can be 7 bits. It is allowed to throw
out the 8’th bit, after all ASCII doesn’t use it.
And signed char is no good, because 1’s complement
signed char has 0xff == 0x00 (they’re both 0).
Unsigned 8 bit char has 256 distinct values, guaranteed!On Mon, 2006-02-13 at 08:50 +0000, mal content wrote:

On 2/13/06, skaller <@john_skaller> wrote:


John Skaller
Felix, successor to C++: http://felix.sf.net

Well, I do usually use unsigned char, I really don’t know why
I didn’t here.

Just out of interest, is there anywhere that this is documented
in a more humanly readable form than the ANSI standards?
They are a bit dense for casual reading.

a1On 2/13/06, skaller wrote:

On Mon, 2006-02-13 at 08:50 +0000, mal content wrote:

On 2/13/06, skaller wrote:

Thanks for all the help so far. The situation is far more fragile than
I had previously thought. Is there ANY safe assignment in C? :slight_smile:

First rule: try to avoid casts;
Second rule: try to avoid implicit conversions;
Third rule: try to avoid C;
Fourth rule: goto Third rule;

:slight_smile:

Ok, so, can these functions be made any ‘safer’ or is this as good
as it’s going to get?

void float32_packl(char n[4], float32 f)
{
uint32 ui = *(uint32 *)(void *) &f;
n[0] = ui & 0xff; ui >>= 8;

Both 0xff and 8 are signed integers. This may work,
though I’m not sure – I never remember those rules :slight_smile:

Better to be explicit about it:

    0xffu .. 8u

gets the arguments agreeing on sign, if not size.

The other problem is that you have an array of char.
you should use array of unsigned char. C guarantees
unsigned char – and ONLY unsigned char – is a lossless
datatype.

All those times you blitted bytes around using arrays
of char … char can be 7 bits. It is allowed to throw
out the 8’th bit, after all ASCII doesn’t use it.
And signed char is no good, because 1’s complement
signed char has 0xff == 0x00 (they’re both 0).
Unsigned 8 bit char has 256 distinct values, guaranteed!

mal content wrote:

Ok, so, can these functions be made any ‘safer’ or is this as good
as it’s going to get?

what’s the problem with printf, scanf? They are super portable
and you can always review your data without the need of a yet
another inspection tool that knows about the data format.

.bill

Perhaps not the most elegant way of accomplishing this, but this works
fine for any data type as long as the data sizes match:

void float32_packl(char *n, float32 f) {
uint32 ui = (uint32) &f;
uint32_packl(n, ui);
}

Ah, I didn’t realise it could be that simple. Thanks!

Be really careful with this. Optimizing compilers will sometimes keep
floating point values in the floating point stack, so (uint32)&f will
be garbage in this case.

… and so will the compiler be. However the above code
is broken. You must do this:

(unit32)(void*)&f

and it may fail if the size and alignment of float32 and
uint32 are not the same.

By definition uint32 and float32 are the safe size. That is the reasons
for the 32 following the uint and float parts :-). These types are not
built in to C/C++ they are typedef’ed somewhere in the code an hopefully
a test was included to make sure that sizeof(uint32) == 4 and
sizeof(float32) == 4.

Is the alignment of the two types even a question? No matter who you
read them, they wind up in a register as a contiguous 32 bit value. If
you read the first byte of either type you get the first byte of either
type. If you declare them as alternate views of the same bytes of RAM
using a union they must map to the same memory if they are the same
size.

In other words, I don’t think this is a problem.

	Bob PendletonOn Mon, 2006-02-13 at 19:26 +1100, skaller wrote:

On Sun, 2006-02-12 at 21:09 -0800, Sam Lantinga wrote:


±-------------------------------------+

mal content wrote:

Ok, so, can these functions be made any ‘safer’ or is this as good
as it’s going to get?

what’s the problem with printf, scanf? They are super portable
and you can always review your data without the need of a yet
another inspection tool that knows about the data format.

As you asking why one would not just convert the values to text? If so,
there are reasons not to do that.

  1. Size. The text representation is generally larger than the binary
    representation. And, the text representation has a variable size while
    the binary representation has a fixed size. A variable size requires
    extra data, or a marker, be added to the representation so that you know
    who big it is. That overhead increases the size even more. Size and
    consistency of size can be important.

  2. Precision. The printf form of a floating point number is not often
    the exact value that was stored in the binary value. Errors are
    introduced because of rounding and the differences between the set of
    repeating fractions in the binary and decimal number systems.

You can just cast the value to some convenient unsigned integer type
and print it out in a fixed length hexadecimal format. That retains the
value of all the bits and has a fixed length. But, it is twice as long
as the binary format.

	Bob PendletonOn Mon, 2006-02-13 at 15:27 +0200, Vassilis Virvilis wrote:
.bill

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl


±-------------------------------------+

Bob Pendleton wrote:

  1. Size. The text representation is generally larger than the binary
    representation. And, the text representation has a variable size while
    the binary representation has a fixed size. A variable size requires
    extra data, or a marker, be added to the representation so that you know
    who big it is. That overhead increases the size even more. Size and
    consistency of size can be important.

You can always compress the text. I don’t believe that the penalty
of text compression is that big. In the common case speed is also not
a problem. As for the markers the same goes for the binary representation,
you need some sort of header. In text mode you also need it to know
how much conversions you need to make, but you don’t actually the byte
count.

  1. Precision. The printf form of a floating point number is not often
    the exact value that was stored in the binary value. Errors are
    introduced because of rounding and the differences between the set of
    repeating fractions in the binary and decimal number systems.

This one I will buy but still I believe it is an extreme case.

In my experience binary formats / protocols were
really bad designs during birth to quickly turn up to
real PITA when interoperability of different computers became
requirement. Lots of specialized single use tools, lots of weird
difficult to debug bugs and so on. Really in my mind it doesn’t
worth it nor for the speed, neither for the size.

http://www.faqs.org/docs/artu/ch05s01.html

.bill

Bob Pendleton wrote:

  1. Precision. The printf form of a floating point number is not often
    the exact value that was stored in the binary value. Errors are
    introduced because of rounding and the differences between the set of
    repeating fractions in the binary and decimal number systems.

glibc supports %a for the format string, its not really portable nor
ansi, though.

clemens

  1. Precision. The printf form of a floating point number is not
    often the exact value that was stored in the binary value. Errors
    are introduced because of rounding and the differences between the
    set of repeating fractions in the binary and decimal number systems.

That’s true in the most general case. However, if you know that you’re
using IEEE-754 floating-point values at both ends (which is very
likely), and you don’t need to preserve non-numeric beasts such as
signalling NaNs and the like, then you can be confident that 9 decimal
digits is sufficient to preserve a 32-bit floating-point value, and 17
decimal digits is sufficient to preserve a 64-bit double. (See “The
IEEE Standard” section of “What Every Computer Scientist Should Know
About Floating-Point Arithmetic”.)

b

I do agree, I like text formats but unfortunately, nobody has come
up with a widespread text-based 3D model file format supported
by my favourite tools. I’m stuck with reading and writing 3DS files
for now. I hope this changes in the future but until that day, I’m
packing and unpacking floats…

a1On 2/13/06, Vassilis Virvilis wrote:

In my experience binary formats / protocols were
really bad designs during birth to quickly turn up to
real PITA when interoperability of different computers became
requirement. Lots of specialized single use tools, lots of weird
difficult to debug bugs and so on. Really in my mind it doesn’t
worth it nor for the speed, neither for the size.

Yes, generated with this:

#include <stdio.h>

int go(const char* s)
{
puts("#ifndef _UINT32_H");
puts("#define _UINT32_H");
puts("/* automatically generated - do not edit */");
printf(“typedef %s uint32;\n”, s);
fprintf(stderr, “uint32:%s\n”, s);
puts("#endif");
return 0;
}

int main()
{
if ((sizeof(unsigned int) * 8) == 32) return go(“unsigned int”);
if ((sizeof(unsigned long) * 8) == 32) return go(“unsigned long”);
if ((sizeof(unsigned long long) * 8) == 32) return go(“unsigned long long”);

return 1;
}

and the floats with this:

#include <stdio.h>

int go(const char* s)
{
puts("#ifndef _FLOAT32_H");
puts("#define _FLOAT32_H");
puts("/* automatically generated - do not edit */");
printf(“typedef %s float32;\n”, s);
fprintf(stderr, “float32:%s\n”, s);
puts("#endif");
return 0;
}

int main()
{
if ((sizeof(float) * 8) == 32) return go(“float”);
if ((sizeof(double) * 8) == 32) return go(“double”);
if ((sizeof(long double) * 8) == 32) return go(“long double”);

return 1;
}On 2/13/06, Bob Pendleton wrote:

By definition uint32 and float32 are the safe size. That is the reasons
for the 32 following the uint and float parts :-). These types are not
built in to C/C++ they are typedef’ed somewhere in the code an hopefully
a test was included to make sure that sizeof(uint32) == 4 and

It’s more than a matter of keeping values in a floating-point
stack. The C standard allows the compiler to assume that
such pointers don’t point to the same memory. You can see
this effect even with integers if they have different base types.

(char*) is special. It can legitimately alias with anything.

(int*) and (unsigned int*) can legitimately alias.

(int*) and (short*) can not legitimately alias. The compiler is
free to assume that the memory is distinct. The compiler need
not bother trying to prove anything about aliasing once it sees
that the base types are different.

I’m not 100% sure, but I don’t think casting through (char*)
will solve the problem. That is, I don’t suggest trying this:

intptr = (int*)(char*)longptr;

You can’t fix the problem with volatile. I’ve actually tried,
and I assure you that gcc doesn’t care. Fixes are:

a. compiler option like -fno-strict-aliasing
b. use a union
c. compiler-specific hack: attribute((mayalias))
d. use (char*) to access the memoryOn 2/13/06, Sam Lantinga wrote:

Perhaps not the most elegant way of accomplishing this, but this works
fine for any data type as long as the data sizes match:

void float32_packl(char *n, float32 f) {
uint32 ui = (uint32) &f;
uint32_packl(n, ui);
}

Ah, I didn’t realise it could be that simple. Thanks!

Be really careful with this. Optimizing compilers will sometimes keep
floating point values in the floating point stack, so (uint32)&f will
be garbage in this case.

This violates the C standard type aliasing rules. I’ve seen it
generate bad code using gcc. (casting double* to short*)
To make your code work, do one of:

a. use a compiler option like -fno-strict-aliasing (cheating)
b. use a union instead of casting
c. use char*, which is allowed to alias with any type

Modern machines are quite consistant about float and
double, but not long double. The normal IEEE format is
used, in the appropriate endianness. Assuming you
don’t care about 16-bit machines, use these unions:

union {
unsigned u; /* note: do NOT use “long” */
float f;
}uf;

union {
unsigned long long ull; /* note: do NOT use “long” */
double d;
}ulld;

Watch out for alignment with that second one. Some ABIs
will pad a 64-bit value to fully align it. Other ABIs will only
add enough padding to align on a 32-bit boundry. You’d best
make sure that you make things naturally aligned so that
no sane compiler will add padding.

Don’t try this with “long double”. On i386 alone you are
likely to see:

64-bit with 32-bit alignment
64-bit with 64-bit alignment
80-bit with 16-bit alignment
96-bit with 32-bit alignment (16 unused bits)
128-bit with 64-bit alignment (48 unused bits)
128-bit with 128-bit alignment (48 unused bits)On 2/12/06, Martin Storsj? wrote:

On Sun, 12 Feb 2006, mal content wrote:

I basically need to write ‘float32_packl’ (packs a 32 bit
floating point into a 4 byte char array in little endian
byte order).

Perhaps not the most elegant way of accomplishing this, but this works
fine for any data type as long as the data sizes match:

void float32_packl(char *n, float32 f) {
uint32 ui = (uint32) &f;
uint32_packl(n, ui);
}