Design advice for a config seralization interface for parsing flags?

josiest · November 3, 2022, 6:31am

As a hobby project, I’ve been working on a small-scoped (de)serialization library for parsing config files specifically related to SDL. The main scope of the library is to sort of act as an interface between toml++ and std::expected (although I’m actually using tl::expected since it’s readily available, and even has monadic functions!). I’m currently facing a sort of design dilemma, and I wanted to see if anyone would be willing to offer some insight.

With the current interface of my library (called raisin) you can write a function load_value as a customization point, so you can do things like this:

raisin::expected<SDL_Point, std::string>
load_value(toml::table const & config, std::string const & variable_path)
{
    SDL_Point p;
    auto result = raisin::subtable(config, variable_path)
        .and_then(raisin::read_value("x", p.x))
        .and_then(raisin::read_value("y", p.y));

    if (!result) {
        return raisin::unexpected{ result.error() };
    }
    return p;
}

// Read from toml:
// [player-spawn]
// x = 32
// y = 19
SDL_Point player_spawn_point;
raisin::parse_file(config_path)
    .and_then(raisin::read_value("player-spawn", player_spawn_point))
    .and_then(/* load more config settings ... */);

Where raisin::read_value returns a monadic function taking and returning a toml::table, and writes the loaded value to its second parameter when it’s successful. It looks like this:

inline namespace raisin {
template<value_loadable value_t>
auto read_value(std::string const & variable_path, value_t & output)
{
    return [&variable_path, &output](toml::table const & config)
        -> expected<toml::table, std::string>
    {
        auto result = load_value(config, variable_path);
        if (!result) {
            return unexpected{ result.error() };
        }
        output = *result;
        return config;
    };
}
}

Here, value_loadable essentially just requires that load_value(config, variable_path) doesn’t fail.

Right now the dilemma I’m facing has to do with loading config flags like SDL_INIT_*. Lets say I have a map of strings to SDL subsystem flags that looks something like:

std::unordered_map<std::string, std::uint32_t> const all_subsystem_flags{
    { "video", SDL_INIT_VIDEO },
    { "audio", SDL_INIT_AUDIO },
    // ...
};

And a toml config file that looks like this

[init]
subsystems = ["video", "audio", "unicorns"]

The raisin library currently has a function load_flags, that has quite a messy signature, but would allow you to do this

std::vector<std::string> invalid_flagnames;
auto flag_result = raisin::load_flags(config, "init.subsystems",
                                      all_subsystem_flags,
                                      std::back_inserter(invalid_flagnames));

This offers a pretty robust interface in the sense that you can customize what you expect the config to look like, and you can keep track of any flags specified in the config that you don’t expect. The last two parameters are also type-constrained template types that don’t force you to use std::unordered_map and std::back_inserter(std::vector).

And it has a template parameter with a default value that limits the number of names you can write to the invalid name output - though this still is still insecure if you give it something like a pointer to an array (though now that I think about it, I could probably write an overload that takes a span or an output_range, and then constrain the overload that takes std::weakly_incrementable so that it will only match with insert_iterators)

I was thinking about adding an overload that doesn’t take an output iterator in case you don’t care about keeping track of invalid flag names, but that’s not the problem. If I want to be consistent with the style of interface this library offers, I want to be able to do something like

std::uint32_t subsystem_flags;
raisin::subtable("init")
    .and_then(raisin::read_flags("subsystems", subsystem_flags));

I’ve been stewing on this for quite some time now, and the solution I’ve thought of so far is to make a class that keeps a reference to the name-to-flag map, and can generate monadic functions that also reference this map. So using it might look like

raisin::flag_loader loader{ all_subsystem_flags };
raisin::subtable("init")
    .and_then(loader.read_flags("subsystems", subsystem_flags));
    // it might also have an overload that keeps track of invalid names
    // .and_then(loader.read_flags("subsystems", subsystem_flags,
    //                             std::back_inserter(invalid_flagnames));

But creating a class that makes anonymous functions feels kind of clunky (I mean it’s essentially a function that returns a function that returns functions), like it might not feel very good to write as the end user? Although, I do have some industry experience, and the pattern of establishing a resource to use to make other resources isn’t unheard of. For instance, in Unreal, you need to create a delegate handle in order to bind delegates to it. This kind of feels a bit like that, but I’ve also never seen that type of pattern used in serialization before.

I guess I was just wondering if anyone else had any insight? Like, is this approach unreasonable? Is there a better approach that I might not be considering?

Atiladf · March 8, 2023, 8:12pm

If you will be sending only string literals to ‘read_value’, it would be better to send string_view instead: it just put 2 pointers around a pre-existent string. Easier for both compiler and human reading.
The issue with monadic/optional is that the compiler will always be skeptical about if it’s valid or not, until it reads its entire lifetime, to try to simplify, if possible. If there’s a point from where all f()s will work with valid values, you should get rid of the monadic as soon as possible, replacing it by a ValidFlag custom class, which would grant the validity. So those f()s would receive this type, not expecting it to fail, leading to less branching, more performance.
std::map and alike are implemented using linked list, which is almost always disastrous for performance. It’s better to make a system with std::vector (a class wrapping it, making some of the map’s work), preallocating (vector::reserve) an amount for 90% of the time it’s reasonable to expect. If you know all possible flags, then a std::array or static_vector, staying on the stack for performance.