[SDL 1.2] Hardware flipping is slow

Hello, all

In SDL 1.2 on my PC, hardware flipping via
SDL_Flip() on a hardware surface with SDL_DOUBLEBUF
is about three times slower than updating a hardware
surface without SDL_DOUBLEBUF via

    SDL_UpdateRect(surface, 0, 0, 0, 0).

In both the cases, I am creating the surface with
the SDL_HWSURFACE flag.

Is it the expected behavior, or should hardware dou-
ble-buffering be faster, or as fast as,
SDL_UpdateRect?

My PC is an AMD A4-3400 with an AMD Radeon HD 6410
GPU.

P.S.: I have to use SDL 1.2 becuase DOSBox uses it,
and I am working on a patch for pixel-perfect
scaling in DOSBox.