The basic problem for any container or database-type library is how to find out about the contents of a struct. C++ lacks reflection-like capabilities (run-time type information) which means that when a container is presented with something like
struct value
{
int x;
std::string y;
float z;
};
the container has no way of peering into the value-type to do clever things with its members. RTTI would be an acceptable solution, but would be painfully slow in comparison to compile-time analysis. A system built upon run-time types would also lack the type safety of a compile-time metaprogramming solution.
RML’s approach is to refer to members by number. The RM_DEFINE_ROW macro adds a type-list to the struct, and adds a member template that can be used to manipulate columns by number, using the relational::column() function. The relational::column() function is as efficient as referring to a member directly, but is much easier for a meta-program to use since it only needs to be concerned with a single integer to refer to a member of a structure.
The reason this has to be a macro is because it needs to work with names, not just types. It was felt that the benefits of being able to write customer.name or customers.name instead of column<1>(customers) or col<1,1>() outweighs the inherent disadvantages of using a macro.
The RM_DEFINE_ROW macro (column_names.hpp) defines a struct called table_columns in the row type, which has members of type relational::table_column_number. relational::table inherits from this struct, thereby injecting the names into the table object. This enables the following code to make sense:
table<value> values;
values.x; // Has type relational::table_column_number<value,0>
values.y; // Has type relational::table_column_number<value,1>
This allows queries to be formulated using these special table members that were inherited from the row type, e.g.
select(values, values.x<0)
The Standard containers only have one index, which is generally implemented using a red-black tree. However they lack the ability to update a key, and the ability to index on more than one key simultaneously. For this reason, the Standard containers were not suitable, and another implementation had to be found.
Boost.MultiIndex would have offered a perfectly good solution, however it did not compile on my compiler at the time! So I just wrote the parts that were needed.
The basic task of a multi-index is to combine several nodes into one. This means that there is only one memory allocation per row (plus memory allocation performed by data members, e.g. strings), no matter how many keys there are. RML only required a tree-based implementation, not a list. The multi-index also provides an efficient reindex facility without erasing and recreating the node.
To implement the multi-index, the red-black algorithms had to be templated and parameterised by index number. Each index used a different key and it all works.
An important step is how to get to get from a type-list (contained in the row) to an index list. A simple meta-algorithm transforms the type-list into a list of index types.
The template relational::table (table.hpp) implements tables. It delegates most of the real work to relational::multi_table (table.hpp) which is a multi-indexed container. This manages a list of tables of type relational::mt_tree (multitree.hpp) which is a basic red-black tree that has been made more generic to cope with several nodes per multi-node (relational::multi_node). The template relational::sort_column (table.hpp) provides a comparator for relational::multi_nodes.
The template relational::utils::gen_indexes1 (table.hpp) analyses the row type, and builds up a list of indexes. This is then passed to relational::multi_table (table.hpp), which uses relational::utils::convert_indexes_to_node to define a node-type, and relational::utils::instantiate_index to combine relational::mt_trees into one container.
The traits class currently just contains an allocator. However the reason it is a traits class is for extensibility. One quite cool thing to do would be to be able to have the table connect to a real database, not just to operate in memory. By using a traits class, it would be possible to provide an alternative implementation without affecting the external interface.
Results in RML are iterated rather than stored. The basic paradigm is that each query is represented by a container – or something that is very close to a C++ container. If certain features (such as size()) cannot be implemented exactly to the standard, it is still useful having something with a nearly-container like interface.
Results containers don’t actually store any results, they merely know how to iterate the results. This is the only efficient solution, since it would be inefficient to store entire sets of results when they are already stored in the tables. The exception is of course the relational::table container which does actually store the rows.
These results containers can then be composed efficiently, e.g.
sum(limit(select( ... ) ) )
without copying large numbers of results. It is only when results are iterated that any processing is actually performed. Most importantly, this is memory efficient – there is no memory allocation required for a query.
A select iterator is implemented as a pair of column iterators, which are effectively a pair of pointers to nodes in a multi-tree. The ++ operator performs a ++ on the relevant column in the table, which traverses the tree of the particular index. The iterator is at the end when the two column iterators are equal.
Multi-table selects are implemented as N pairs of iterators, where N is the number of tables in join. These are advanced in a depth-first search manner, equivalent to writing nested loops.
There is one interesting task when trying to compose a list of types. In the list
select( (a,b,c,d), ...)
the comma operator associates to the left, ending up with the type
list<list< list <a,b>, c>, d>
This is transformed into the type
list<a, list<b, list<c, list<d, empty> > > >
(this metaprogram is an exercise to the reader).
The query analyser is responsible for selecting an index from each table in a join.
Without a query analyser, there would be no choice but to iterate through each row in turn, starting at begin() and finishing at end(). Indeed, when the query analyser does not find a suitable column to iterate, it selects the default column (the first column with an index), and does actually iterate between begin() and end().
The query analyser analyses the type of the condition and detects comparators ==, <, <=, >, >=. All of these comparators bound the query, and when performed on an indexed column, can make the query more efficient. Instead of starting at begin() and ending at end(), it can start iterating from a specific key, and finish iterating at a specific key. This obviously reduces the search space and makes the query efficient.
One particular struct to look at is relational::select_join_t::table_info (in select.hpp). This uses the get_index_column member of the Condition to suggest a column to index upon. the get_index_column member of a condition will recursively use get_index_column in the sub-conditions in order to find an index for a particular table.
Possibly this library should become integrated with (though not necessarily become a part of) Boost. Boost contains many libraries that RML could make use of, including
In the end, it is laziness on the part of the author that Boost was not used. The really useful library would be MultiIndex, but I ended up writing the parts that I needed. There is a subtlety in the implementation which requires that left-insertion is needed instead of MultiIndex’s right-insertion in order to prevent infinite loops. Also repositioning nodes in the index was needed. Nevertheless I did get inspiration from Boost.MultiIndex, when read the description and realized that multiple nodes could be combined in a single object. So thanks to Joaquín M López Muñoz.
As a result, RML is a completely independent C++ library, which is great for people who do not use Boost but may want to try RML.
This work will not get done unless there is demand for it.