Posts (summary)
Posts (full)

Tagged container

In GObject, each object have methods like g_object_get_data, g_object_set_data, and friends. This allows you to attach arbitrary data to objects. This may be familiar from scripting languages, like JavaScript, where all attributes are attached like this.

js> var x = new Object
js> x
js> = 3
js> x

What if you want to do it in c++? One solution is to simply use a std::unordered_map<std::string, void*>. This is about the GObject implementation, except the signals. However, there are problems with it: no type safety at all, someone has to manually delete the pointer. A little bit better idea is to use std::unordered_map<std::string, boost::any>. It provides type safety, and (unless you put a pointer into boost::any) it’ll also take care of freeing your memory. We could move on, as we solved this problem, right? No, it still have problems:

  1. Name clashing: all that identifies your object is a simple string, so unless you work out some kind of “naming standard” to avoid unrelated modules to choose the same name, it can bite you. (And that you know all the used keys and you’re sure they’re all different is a wrong answer, because in this case a simple struct would do the job (probably with boost::optional or some kind of smart pointer members). You only need this solution if you want to store really arbitrary stuff, probably provided by a plugin or user.)

  2. It has type safety, but it’s only “runtime”. There is nothing to stop you to writing map["user_name"] = 3;, when it clearly makes no sense. Like in any language with dynamic typing, it’ll only cause problems when you later try to access it and expect it to be a string (boost::any_cast<std::string>(map["user_name"])).

boost::any itself also have some peculiar behaviours worth noting:

  1. Due to how C++ templates, typeids and thus boost::any works, an int is different to long, even if they have the sime size! But still, my favourite: char, signed char and unsigned char are three distinct types (at least in the Itanium ABI). So whenever you put any integer type into boost::any, you should explicitly cast it to the desired type.

  2. This also applies to not built-in types, like inheritance:

    #include <typeinfo>
    #include <boost/any.hpp>
    class Base {};
    class Derived : public Base {};
    int main()
        boost::any x((Derived()));

    Will it throw an exception? Yes, typeid(Base) != typeid(Derived). So in this case you have to store the Base into boost::any – except that it will result in slicing, so you need a pointer, and std::unique_ptr or something like that, unless, of course, you like memory leaks.

  3. Another probably surprising behaviour: if you have a function, like void foo(const std::string& str) you can call it like foo("bar"). What happens with boost::any? You would except it ends up with const char*, but not, it’l cause a compiler error (as of boost 1.54 and gcc 4.8/clang 3.3). But apart from that, it would be something like char [4]. But constructing from a const char* works.

In a nutshell, there’s a lot of surprising behaviour, and boost docs doesn’t say a word about it… Substituting boost::any with boost::variant solves most of boost::any’s annoyances, but in this case you’re limited to a small subset of all possible types.

Tag it

The idea comes from Boost Exception. I actually have no idea why they didn’t factor it out into a generic container, why only exceptions have it. Anyway, the main idea is that instead of random strings, you declare random tags. I couldn’t really find a definition of tag, so I might not call them tags, but anyway, they are classes whose only purpose is to help selecting the appropriate template.

The usage looks like this: first you declare your tags:

class UserNameTag : public Tag<std::string> {};
class FileSizeTag : public Tag<std::size_t> {};
// etc...

You create a new class, inheriting from Tag. You specify the type you want to store as the Tag’s template argument. So, into UserNameTag you can only store a std::string, and not an int or whatever. Similarly, FileSizeTag stores a size_t.

map.Set<UserNameTag>("dirty_ice"); // calls the implicit std::string constructor
map.Set<UserNameTag>(77);          // error - no conversion from int to std::string

map.Set<FileSizeTag>(128);         // int implicitly converted to size_t
map.Set<FileSizeTag>('c');         // char -> size_t
map.Set<FileSizeTag>("file.txt");  // error

As you can see, instead of operator[] or function argument, you have to specify Tag as a template parameter. So instead of map["foo"] = bar you have map.Set<Foo>(bar). Getting works the same:

auto& s = map.Get<FileSizeTag>(); // no need to specify type
// s is a size_t& now
s += 77;

As you can see, you don’t have to specify the type of the stored item, as it’s deducible from the tag. And unlike with the boost::any, it must be a size_t, because you can’t put anything else in it. So the only way it can go wrong if you didn’t set that tag previously – in this case, like at() in std::vector/map, it will throw an exception. There’s DefaultGet, which works like operator[], and default constructs and sets an object in this case. Finally there’s a MaybeGet which returns a boost::optional<TagType&>. To remove a previously set item, you can call Erase<Tag>(). I didn’t implement a Clear() method – if you can’t be sure what’s inside the container, you probably shouldn’t clear them all out. But if you want, implement it, you just have to call the underlying unordered_map’s clear.


First the tags. Instead of Boost Exception’s typedef error_info<struct tag1, int> tag2;, where you forward declare a struct and typedef a template, and you have to come up with a name to both the tag and both the typedef, instead I’ve chosen to request users to derive from a given class. The Tag<type> is never used in user code, but actually it’s a struct that has a base class (TagBase) with a virtual destructor, and contains an instance of type. Thus it’s just type, but now we can polymorphically delete it (it’s needed at least in the destructor).

Onto the container: it’s just a std::unordered_map<std::type_index, std::unique_ptr<TagBase>> and some methods to manipulate it. The key is the type_info of the user derived Tag type (UserNameTag, FileSizeTag in the above example), wrapped in a std::type_index so they can be put inside a container. The value is a pointer to TagBase, std::unique_ptr makes sure we won’t leak memory.

Getting, setting: all getter/setter is template <typename T>, where T is the Tag type. There’s an EnsureTag<T>() call which prevents someone accidentally using Tag<something> directly. Other than it, it’s pretty basic stuff. Set looks like this:

template <typename T>
void Set(const typename T::Type& val)
    map[typeid(T)].reset(new Tag<typename T::Type>{val});

We get the appropriate key from the map – maybe default constructing it, but it’s not a problem since it’s only a std::unique_ptr – and overwrite it with a newly allocated instance (and std::unique_ptr automatically frees previous instance, if any). You may notice that I allocated Tag<T::Type>, and not T: this way even if the user somewhy puts extra stuff into T it won’t cause problems. Getters are similar, except you have to cast the item into the right Tag<T::Type>*.

And probably that’s all. It’s a pretty basic implementation, without any kind of copy support. But this way there’s (almost) no requirements on what can you put into the container: it must be constructible and destructible. DefaultGet additionally requires default construction, of course, but otherwise it’s not required.

The code is so basic that I’ve only created a gist for it: TaggedContainer gist.

Final stuff (aka. conclusion)

So, what did we earn? Other than more strange syntax, we have a container that can contain arbitrary objects indexed by a tag, yet provide compile time type safety, at setting site, not when you try to retrieve it. Due to this compile time type safety, the problems with various integer types and implicit constructors not called are also solved. The tag system also protects pretty well against accidental name clashes (we have namespaces, nested classes). The derived classes problem persist, but it’s a general C++ problem: we can’t store Derived in place of Base, because Derived may be bigger.

On the downside, we can’t do things like a single key that can be either a std::string or an int. We can solve this by using boost::variant though, which also comes with a useful visitor feature. We also can’t do things like runtime generating keys. However in most cases, an extra container solves the problem (like class FooTag : public Tag<std::vector<Whatever>> {};).

So we have something like a struct that can be extended with new members runtime. It is a bit more restrictive that the boost::any solution, because of strict type checking. But there’s nothing to stop you from writing class AnyTag : public Tag<boost::any> {}; ;)

An interesting extra feature: if this container class doesn’t have methods to enumerate stored objects – and my implementation doesn’t have, because all it really have is the Tag’s type_info and a pointer to some struct (and you have to know the Tag, or the Tag’s underlying type to make anything out of it) – you can have private items. Here’s how: you declare the tag as a protected/private inner class – then only your class, friends (and derived classes) can access the tag type, and thus the required template instances. Alternatively you can use unnamed namespaces to restrict access to a compilation unit.