Just to make it clear before I continue: if we ever pretend to get
this in an even remotely acceptable implementation in hardware it’d
have to be a custom GPU (for lack of a better name) with dedicated
raytracing hardware. So from now on, whenever I say GPU, I mean that
kind of GPU. Current ones simply won’t do the job no matter what.
Branches aren’t the #1 enemy, shared data is 
Branches are when parallelism is achieved as SIMD, because it
completely breaks the assumptions under which SIMD works (which is why
in shaders you have to avoid branches like the plague). If you ever
pretend to have raytracing into the many hundreds (thousands?) of
frames per second you will probably end up needing SIMD no matter how
hard you try, either for performance reasons or for cost reasons (you
can only cram so many cores before it becomes too costly).
Anyway, if you consider the raytracing step, you’d need to take into
account that the only thing that would ever be touching the primitive
data is the raycasting unit. As such, the only thing you’d need to
worry about is shared data among rays and nothing else. Furthermore,
at this point of the process the data is read-only, so you don’t even
need to worry about it getting modified, which allows for a lot of
assumptions that can make it even faster.
You’re now involving the GPU/CPU is a incredibly high amount of context
switches; the only way ray tracing will work properly is if it’s self
contained in a massively parallel system, either a card of small CPUs or a
computer with CPUs. Handing it back and forth to GPUs/CPUs is asking for
trouble. I know people have done work in this, but it’s not an optimal
solution, it’s a solution trying to use the hardware that exists already.
Um, no? That’s the worst thing one could ever do, and in fact why
immediate mode was dropped from OpenGL.
The CPU should be limited to just passing a list of primitives (or
list of VBOs, or instance lists, or whatever - you get the idea, same
as we do these days). Once that list is complete everything would run
on the GPU. Yes, this means you need to store such lists in memory in
video hardware, but hey, it needs to be stored somewhere after all,
right?
As you can see, the way I view it, the entire raytracing process would
happen in the video hardware. That’s about as fast as it can get, and
I’d say it’s probably 100% close to optimal.
Take it from somebody that build his own ray tracer – the vast bulk of time
is in collisions, and I do an incredible amount of trickery to reduce those.
The time spend in texture lookup – and doing all the rastering effects
like normal or spec mapping – is nothing. Not even worth passing of to a
GPU.
This is why I talk about a raytracing unit. Basically you have two
things: the raytracing unit takes care of the collisions, while the
shader unit does all those raster effects you talk about. And don’t
get fooled, you’re hardcoding the effects here, in practice you’ll
want to give the same flexibility that pixel shaders have (quirks
unique to raytracing aside, like the lack of a depth buffer).
Raytracing does make some algorithms simpler compared to rasterizing,
though (like shadows and mirrors). Some algorithms are still just as
complex, though. You really don’t want to underestimate how much it
could take given in the hands of an entire development team aiming for
fully detailed photorealism.
Note EVERY hit bounces rays, and multiple times. Every light source starts
another ray (after culling.) That’s why it’s such a hard nut to crack.
I imagine the hardest nut to crack is having to check collisions
against all primitives, not so much the bounced and retraced rays.
Again, more of a reason to have a dedicated raycasting unit that takes
care of all that much faster than a processing unit (CPU or GPU)
could.
Even more OT, sorry list people. It’s an interesting topic, though.
Well, the subject is “Ray Tracing”, and to be fair the thread seemed
to focus more on the raytracing than on SDL for starters. If you want
we can continue this discussion in private, though.