Help with display yuv overlay

Hello, I have looked into some code for v4l capture .
and now I can manage it successfully.Then ,i want to
display the video from my pctv card. I learned overlay
method is a good way .But something I cant understand
The first one :
Is overlay implemented by dma? If it is ,does it
means that I cant get the video data because it is
transfered by pci bus direct to video memory. Then how
can I get the content in the video mem to screen ?
The second one:
I learned SDL can do the yuv overlay display .but when
I looked into the doc for sdl ,I found that when
display yuvovelay surface, memcpy operation still is
needed as follow.
//memcpy(overlay->pixels, map, (width * height * 3) /
As you know , memcpy() is a slower way to transfer
data.Then I want to ask these questions :Will overlay
displaying with sdl cost most time of cpu ? How can I
utilize the hardware accelerate to implement the yuv
overlay displaying with SDL? Where can I find such
examples or routines?

Any help will be nice ,thanks in advance!__________________________________________________
Do You Yahoo!?
Try FREE Yahoo! Mail - the world’s greatest free email!