Need Help With Refactoring My Software Rendering Pipeline

anon48100748 · September 19, 2022, 10:56am

This is my PS1-style rasterizer (written in C89 and SDL 2.0).
I am currently refactoring it, and I am somewhat stuck deciding how to approach this.

What I have, is a typical game loop similar to the pseudocode below

while (game.is_running() == 1) {
  ProcessInput();
  Update();
  Render();
}

… where Update() does vertex transformations (matrix multiplications).
It’s the “vertex processor” or “vertex shader”, if you like.
It also checks whether the triangle’s facing away from the camera or not (culling).
In pseudocode, it looks like this:

void Update() {
  /* Transform vertex positions. */
  for (auto mesh : meshes) {
    for (auto face: mesh->faces) {
      auto transformed_face = TransformTriangleFace(face, world_transform); 
      if (CullTriangleFace(transformed_face, camera_position) == true) {
        continue;
      }
     
     fragment_buffer.push_back(transformed_face); /* Note: "fragments" are things that are going to be rasterized. */
  }
}

Render() sorts my triangle faces by its average depth from back to front (Painter’s algorithm).
Then, it projects them (“weak projection”, that is, a simple z division) and finally “rasterizes” them (scan conversion if you like).
In pseudocode, it looks like this:

void Render() {
  pixel_buffer.clear();

  /* Apply painter's algorithm. */
  SortTriangleFacesByAverageDepth(fragment_buffer);
  
  /* Perspectively project fragment (vertices) and convert to pixel coordinates ("Viewport transform"). */
  for (auto& fragment : fragment_buffer) {
    fragment = ProjectTriangleFace(fragment); /* Simple weak perspective projection with z-divide. */
    fragment = ConvertToPixelCoordinates(fragment);
  }

  /* Convert fragment raster points to int and rasterize triangle. */
  for (auto& fragment : fragment_buffer) {
    /* ... */
      RasterizeTriangle(kAX, kAY, kBX, kBY, kCX, kCY, OnComputeColor,
                          OnDrawPixel, fragment);
      
    }
  }

  /* Prevent infinite growth and overwrite old values. */
  fragment_buffer.reset_length(self->fragments);

  pixel_buffer.flip_vertically(pixel_buffer);
  UpdateSdlRgbSurface(pixel_buffer_data);
  UpdateSdlWindowSurface();
}

The problem is, it consumes too much memory.
I am allocating enough space to fit in all the triangle faces from all of my meshes.
Is there a better approach?

anon48100748 · September 20, 2022, 5:55am

Allow me to be more concrete, using real snippets of code.

#include "core/game.h"

int main(void) {
  unsigned long system_time = 0UL;
  unsigned long simulation_time = 0UL;
  GameHandle game;
  Game_initialize(&game);
  Game_open_window(game);

  while (Game_is_running(game) == 1) {
    /* Cap framerate (30 FPS). */
    system_time = Game_get_milliseconds();
    Game_limit_framerate(game, system_time);

    Game_process_input();
    /* Ensure framerate independent movement. */
    while (simulation_time < system_time) {
      simulation_time += 16;
      Game_update(game, 16);
    }

    Game_render(game);
  }

  Game_shutdown(game);
  return 0;
}

So as you can see, Game_update(game, 16) runs multiple times before it comes to Game_render(game).

That means the fragment buffer “explodes” (more fragments - world transformed vertices, than original vertices), if I don’t limit the fragment buffer’s maximum range.
I don’t know any better.
Other than implementing some kind of lock-and-wait mechanism.

void Game_update(GameHandle self, int milliseconds) {
  static float angle = 0.0f;
  Mesh* mesh = self->meshes;
  Mesh* const kMeshEnd = mesh + self->mesh_count;

  /* Update rotation angle. */
  angle += 1E-3f * (float)milliseconds;

  /* Apply world transform, cull (if applicable) and store result into fragment
   * buffer.
   */
  for (mesh = self->meshes; mesh != kMeshEnd; ++mesh) {
    TriangleFace* face =
        (TriangleFace*)DynamicArray_begin(mesh->triangle_faces);
    TriangleFace* const kFaceEnd =
        (TriangleFace*)DynamicArray_end(mesh->triangle_faces);
    TriangleFace fragment = *face;
    /* Compute world transform. */
    Transform4 const kScaling =
        CreateTransform4Scaling(mesh->size.x, mesh->size.y, mesh->size.z);
    Transform4 const kRotationX =
        CreateTransform4RotationX(mesh->orientation.x);
    Transform4 const kRotationY =
        CreateTransform4RotationY(mesh->orientation.y + angle);
    Transform4 const kRotationZ =
        CreateTransform4RotationZ(mesh->orientation.z);
    Transform4 const kTranslation =
        CreateTransform4Translation(self->translation.x + mesh->position.x,
                                    self->translation.y + mesh->position.y,
                                    self->translation.z + mesh->position.z);
    Transform4 world_transform = CreateIdentityTransform4();
    world_transform = ComposeTransform4(world_transform, kTranslation);
    world_transform = ComposeTransform4(world_transform, kScaling);
    world_transform = ComposeTransform4(world_transform, kRotationX);
    world_transform = ComposeTransform4(world_transform, kRotationY);
    world_transform = ComposeTransform4(world_transform, kRotationZ);

    for (; face != kFaceEnd; ++face) {
      TransformTriangleFace(face, &fragment, world_transform);
      fragment.texture = mesh->texture;

    /* Apply painter's algorithm. */
    if (self->depth_sort == 1) {
      qsort(self->fragment_buffer, self->triangle_count, sizeof(TriangleFace),
            CompareAverageDepth);
    }

      if (self->cull == 1) {
        int skip =
            Cull(fragment.vertices[0].position, fragment.vertices[1].position,
                 fragment.vertices[2].position, self->camera_position);
        if (skip == 1) {
          continue;
        }
      }

      if (self->fragment_count <= self->triangle_count) {
        self->fragment_buffer[self->fragment_count] = fragment;
        ++self->fragment_count;
      }
    }
  }
}

The artificial limitation occurs inside

      if (self->fragment_count <= self->triangle_count) {
        self->fragment_buffer[self->fragment_count] = fragment;
        ++self->fragment_count;
      }

However, this apparently leads to render artifacts.

void Game_render(GameHandle self) {
  TriangleFace* fragment = self->fragment_buffer;
  TriangleFace* const kFragmentBufferEnd = fragment + self->fragment_count;
  int raster_points[6] = {0, 0, 0, 0, 0, 0};
  Transform4 const kTransform = CreateTransform4WorldWindowToRasterDevice(
      (float)self->pixel_buffer_width, (float)self->pixel_buffer_height);
  PixelBuffer_clear(self->pixel_buffer, 76, 82, 78);

  /* Perspectively project vertices and convert to pixel coordinates. */
  for (fragment = self->fragment_buffer; fragment != kFragmentBufferEnd;
       ++fragment) {
    size_t i = 0UL;
    for (i = 0UL; i < 3UL; ++i) {
      PROJECT_POINT(fragment->vertices[i].position.x,
                    fragment->vertices[i].position.y,
                    fragment->vertices[i].position.z, 1.0f);
      fragment->vertices[i].position = CombineTransform4AndTuple4(
          kTransform, fragment->vertices[i].position);
    }
  }

  /* Convert fragment positions to raster points. */
  for (fragment = self->fragment_buffer; fragment != kFragmentBufferEnd;
       ++fragment) {
    raster_points[0] = (int)fragment->vertices[0].position.x;
    raster_points[1] = (int)fragment->vertices[0].position.y;
    raster_points[2] = (int)fragment->vertices[1].position.x;
    raster_points[3] = (int)fragment->vertices[1].position.y;
    raster_points[4] = (int)fragment->vertices[2].position.x;
    raster_points[5] = (int)fragment->vertices[2].position.y;

    {
      int const kAX = CLAMP(raster_points[0], 0, self->pixel_buffer_width);
      int const kAY = CLAMP(raster_points[1], 0, self->pixel_buffer_height);
      int const kBX = CLAMP(raster_points[2], 0, self->pixel_buffer_width);
      int const kBY = CLAMP(raster_points[3], 0, self->pixel_buffer_height);
      int const kCX = CLAMP(raster_points[4], 0, self->pixel_buffer_width);
      int const kCY = CLAMP(raster_points[5], 0, self->pixel_buffer_height);

      if (self->fill_polygons == 0) {
        DrawTriangleLines(kAX, kAY, kBX, kBY, kCX, kCY, 255, 255, 255);
      } else {
        RasterizeTriangle(kAX, kAY, kBX, kBY, kCX, kCY, OnComputeColor,
                          OnDrawPixel, fragment);
      }
    }
  }

  self->fragment_count = 0;
  PixelBuffer_flip_vertically(self->pixel_buffer);
  UpdateSdlRgbSurface(self->pixel_buffer_data);
  UpdateSdlWindowSurface();
}

self->fragment_count = 0 is another “artificial” limitation.

The thing is, I allocate 1024 FragmentBuffer objects.

#ifndef MODEL_H_
#define MODEL_H_

#include <stddef.h>
#include "../freestanding/tuple_algebra.h"
#include "../standalones/dynamic_array.h"

typedef struct Vertex Vertex;
typedef struct TriangleFace TriangleFace;
typedef struct TriangleMesh TriangleMesh;

struct Vertex {
  Tuple4 position;
  Tuple4 texture_coordinates;
  Tuple4 vertex_normal;
};

struct TriangleFace {
  Vertex vertices[3];
};

struct TriangleMesh {
  DynamicArrayHandle triangle_faces; /* Array of `TriangleFaces[N]`. */
};

void LoadTriangleMesh(TriangleMesh* triangle_mesh, char const* obj_filename);
void FreeTriangleMesh(TriangleMesh* triangle_mesh);
size_t GetTriangleFaceCount(TriangleMesh const* triangle_mesh);

#endif /* MODEL_H_ */

Sanette · September 20, 2022, 12:10pm

I’m not sure how your code works, but in principle your Game_update function should not do any rendering or rasterization, and hence should not allocate any texture. Just work on triangle coordinates essentially. This should not be a lot of memory, even if you have hundreds (or thousands) or triangles.

Then only the render part will compute hidden faces and rasterize everything at each frame.

anon48100748 · September 20, 2022, 5:30pm

Looks like I failed to explain myself.
Let’s try another way.

I wrote a note-to-self in my “developer_notes.md” file.
I think I got the rendering pipeline, but I end up becoming utterly confused since different people use different CG terminology.
Likewise, I think I understand the concepts enough to rasterize Lara Croft (s. above), however, the devil is in the details they say.

Without further ado, here it goes:

## Rendering Pipeline

0. Model() -> 3D object coordinates
1. World(3D object coordinates) -> 3D world coordinates
2. Light(3D world coordinates) -> light intensity value
3. View(3D world coordinates) -> 3D camera coordinates
4. Projection(3D camera coordinates) -> 2D world coordinates `(e.g., matching a point on a near/projection plane)`
5. Painter's algorithm(projected 2D world coordinates `(e.g., matching a point on a near/projection plane)`) -> projected 2D world coordinates `(e.g., matching a point on a near/projection plane)`
6. Clip(projected 2D world coordinates `(e.g., matching a point on a near/projection plane)`) -> projected 2D world coordinates `(e.g., matching a point on a near/projection plane)`
7. Viewport(projected 2D world coordinates `(e.g., matching a point on a near/projection plane)`) -> 2D pre-fragment coordinates
8. Rasterization(2D pre-fragment coordinates) -> 2D fragment coordinates.
9. Fragment shading/processing(2D fragment coordinates) -> 2D pixel coordinates
10. Draw pixel(2D pixel coordinate) -> pixel (on screen)

So each of these points represent a stage of the rendering pipeline.
And each stage has a “math-like function” of the form MyFunction(Input) -> Output.
The output of one stage will be the input of another stage below.

Commentary to the points:

So here I start modelling the 3D object in Blender.
I export it as an OBJ file.
Load it up in my software renderer, apply scaling, rotation and translation for each 3D point in world coordinates.
In the meantime, I also compute the light intensity using these transformed 3D world points.
To compute the light intensity, I need to think about an illumination model (e.g., <N, L> := my light intensity - Lambertian illumination model).
Then I apply that illumination model to each color component (shading).
Now, I first position a camera from the world origin to another 3D world point via scaling, rotation and translation.
Then, I invert this matrix/transformation and use that to convert each 3D point in camera coordinates (view transform).
The transform from camera-to-world coordinates is similar to the transform from model-to-world-coordinates.
So if I invert this camera-to-world transform, I get a transform from world-to-camera coordinates (what we need).
Then, I project these camera coordinates to the projection plane located at the origin (0, 0, 0).
x*: = H * x / z, y*: = H * y / z.
(Weak) perspective projection is essentially a division by z (assuming H, the distance from the projection plane’s origin is 1.)
Now we have 2D world coordinates projected onto the projection plane.
Or 2D world coordinates in the projection plane that match the projection coordinates (x*, y*, 0) = (x*, y*, z).
Now, since I kept the z in (x*, y*, z), I use that to compute the average depth of a triangle.
For this, I use the z value of the vertices of the triangle and calculate its average (z1 + z2 + z3) / 3.
And I sort from back (e.g., z = 1000) to front (e.g., z = 1).
If the view frustum is not normalized, I need to cut away triangles going outside the view frustum (camera frustum).
So for this, I need the original values after stage 1 from the world coordinates (world_x, world_y, world_z) and setup bounding volumes.
I would do this not at stage 6, but after stage 2 (after the world transform).
If these are normalized, I can do this after the projection and normalization transformation using (x*, y*, z).
Now, these projected points in 2D world space need to match a discrete pixel grid (e.g., a pixel buffer).
Now we have “pre-fragment” coordinates.
The rasterizer takes the “pre-fragment” coordinates and converts it to fragment coordinates.
Fragments are “raw pixels”, if you like that need to be colored (“fragment shaded”).

The fragment shader is inside the rasterizer.
I think of it as a “callback function” that the rasterizer calls for each fragment.
Now we finally take all these fragments, apply a light intensity for each color channel and interpolate texture coordinates.

As an example, interpolating texture coordinates for each fragment (given the rasterizer provides us with areal/barycentric coordinate for each fragment) means

  // ...
  float const kInterpolatedU = alpha * u0 + beta * u1 + gamma * u2;
  float const kInterpolatedV =
      alpha * (1.0f - v0) + beta * (1.0f - v1) + gamma * (1.0f - v2);
  int const kWidth = Texture_get_width(texture);
  int const kHeight = Texture_get_height(texture);
  unsigned char* texels = Texture_get_data(texture);
  int const position_x =
      CLAMP(ABSOLUTE_VALUE((int)(kInterpolatedU * kWidth)), 0, kWidth - 1);
  int const position_y =
      CLAMP(ABSOLUTE_VALUE((int)(kInterpolatedV * kHeight)), 0, kHeight - 1);
  return texels + ((position_x + position_y * kWidth) *
                   Texture_get_bytes_per_pixel(texture));
  // ...

Finally, we have the pixel out.

The pixel is visible on your screen.

Okay, that’s how I think about it.
Is there anything wrong, here?

Also, I have another relevant note that I made for myself:

### 01 Fragment

Assume points in 2D or 3D space have continuous coordinates.

A rasterizer converts these continuous coordinates to discrete ones.

We call these discrete coordinates from the rasterizer "fragments".

Think of  fragments as "non-ready pixels", "raw pixels" or "soon-to-be pixels", if you like.

Fragment shaders/processors compute the final pixel color (e.g., by interpolating texture coordinates).

A fragment shader resides inside a rasterizer.

Update:

## Rendering Pipeline

0. Model() -> 3D object coordinates
1. World(3D object coordinates) -> 3D world coordinates
2. Cull(3D world coordinates) -> 3D world coordinates (frustum and/or back-face culling)
3. Light(3D world coordinates (with or without culling)) -> light intensity value
4. View(3D world coordinates (with or without culling)) -> 3D camera coordinates
5. Projection(3D camera coordinates) -> 2D world coordinates `(e.g., matching a point on a near/projection plane)`
6. Painter's algorithm(projected 2D world coordinates `(e.g., matching a point on a near/projection plane)`) -> projected 2D world coordinates `(e.g., matching a point on a near/projection plane)`
7. Viewport(projected 2D world coordinates `(e.g., matching a point on a near/projection plane)`) -> 2D pre-fragment coordinates
8. Rasterization(2D pre-fragment coordinates) -> 2D fragment coordinates.
9. Fragment shading/processing(2D fragment coordinates) -> 2D pixel coordinates
10. Draw pixel(2D pixel coordinate) -> pixel (on screen)

Okay, I think this is more suitable for my software renderer, since I have no normalization in my perspective transform.

Or wait, according to @Sanette, stage 2 (culling) should be after stage 5 (projection, right?)
So the position I had was alright, maybe.

sjr · September 21, 2022, 3:19am

Something to think about is that a lot of software renderers back in the day cheated like hell.

IDK what #6 is doing, but the term “painter’s algorithm” just means things are drawn in the order they’re sent to the renderer, with the background being “painted” first, then progressively closer and closer objects. Sometimes people just mean the first part (draw stuff in the order it’s sent to the renderer), however.

As to culling, that should happen as early as possible (don’t confuse object culling with backface culling). Clipping is what happens after projection, where the polygons are clipped to the viewport.

edit: your game update function shouldn’t be doing vertex transformations etc. Do that all in the renderer. The game update should be moving objects if needed, and doing other game logic (AI, physics, input, etc.), and then only when it’s time to render a frame do the vertex transformations on objects and level pieces that remain after culling. There’s no reason whatsoever to be doing multiple vertex transformations per frame.

edit 2: In the code you posted, there seems to be some confusion relating to terminology. A “fragment” isn’t a transformed vertice, it’s a shaded pixel. In GPU land, the fragment shader is run for every rasterized pixel. That’s why in DirectX/HLSL land a fragment shader is called a pixel shader.