Firstly, check the website http://visula.org/relational for the list of supported compilers. If your compiler is not on the list, you may still be able to use RML if your compiler is standards-compliant. Download and unzip the distributed files. The library is distributed as headers only, so copy the contents of the include directory to the desired location. There is no build step.
Compile the file rm_test.cpp and run it. All being well, you should be informed that the tests pass.
There is just one header file, relational.hpp.
#include <relational.hpp>
All symbols (except macros of course) are in the relational namespace. It might be worth writing
using namespace relational;
near the top of your .cpp file, especially to get the examples to compile.
In RML, all data is stored in tables. Each table has a fixed number of columns defined by the row. Data is added to tables in rows, and an empty table contains no rows.
Rows are defined using the RM_DEFINE_ROW_N macro. (Thankfully, this is the only macro). This macro essentially defines a struct, but also tags it with information to make it usable in tables. The macro takes the form
RM_DEFINE_ROW_N(RowName, Type1, Name1, Type2, Name2 ...)
Where N is the number of members, RowName is the name of the struct, Type1 is the type of the first member, Name1 is the name of the first member and so on. For example
RM_DEFINE_ROW_3(Point, float, x, float, y, float, z)
defines a struct called Point, that essentially looks like
struct Point
{
float x;
float y;
float z;
Point() { }
Point(const float &x0, const float &y0, const float &z0) :
x(x0), y(y0), z(z0) { }
};
The constructor of a row initializes the row with data, and there is also a default constructor, a default copy constructor and a default assignment operator.
Point p1, p2(1,2,3), p3=p2;
Tables are declared using the relational::table template. The row type is supplied as a template argument.
table<Point> points;
defines a table of Points.
Finally, indexes must be specified for the table. An index is a look-up mechanism that makes searching a particular column more efficient. For example to search customers by surname, it could be a good idea to put an index on the surname column of the table. Similarly to search for customers by customer-id, it would be a good idea to put an index on the customer-id column.
Indexes are specified on columns using the relational::indexed and relational::unique templates. These are tags to tell the table to index the specified column. For example:
RM_DEFINE_ROW_3(Contact,
unique<int>, id,
indexed<string>, surname,
string, telephone)
The unique tag on the id column creates a unique index on id, and the indexed tag on the surname column creates an index on the surname. The telephone column is unindexed.
A table must contain at least one index, so Point must in fact be defined
RM_DEFINE_ROW_3(Point, indexed<float>, x, float, y, float, z)
The method relational::table::insert() is used to insert a row into a table. For example
points.insert(p1);
points.insert(Point(0,0,0));
If a column is unique, and an attempt is made to insert a duplicate into that column, then the exception relational::duplicate_key will be thrown, and the row will not be added.
All queries return a Results object that represents results. A Results object does not store any data itself, but it knows how to retrieve the results.
The basic way to obtain a Results object is to call the relational::select() function. This function does not actually execute the query, but it merely returns a Results object that can be iterated. It is only as the results are iterated that the query is executed.
The relational::select() function takes two arguments: a table (or list of tables, see later), and a condition.
template<typename TableRefList, typename Condition>
Results select(const TableRefList &tables,
const Condition &condition)
The condition acts as a filter (or “predicate”), which specifies which rows to return. The result only contains rows for which the condition is true.
There are two special conditions, which are all() which is always true, and none() which is always false. So
select( customers, all() )
returns a Results object containing all of the customers.
select( customers, none() )
returns a Results object containing no results.
A syntax for conditions is provided using the built-in C++ operators to write conditions in a more natural way. They include Boolean operators && (and), || (or) and ! (not), and all of the comparators (==, !=, >, >=, <, <=). It is even possible to perform arithmetic using the operators (+, -, *, /, %).
The column names are members of the table (this magic is performed by the RM_DEFINE_ROW_N macro).
If we define
RM_DEFINE_ROW_3(Customer,
unique<int>, id,
indexed<std::string>, name,
float, balance)
table<Customer> customers;
the expression
customers.id
refers to the id column in the table (as opposed to the id column of a row, which is a concrete value). The condition
customers.id == 5
does not evaluate immediately to a bool value, but returns a condition that is evaluated as the query progresses.
Complex conditions can be built up using the C++ operators. For example
customers.name >= “a” && customers.name < “b”
accounts.balance >= 100 && accounts.balance < 1000 &&
(accounts.status != awaiting_approval ||
accounts.type == premium)
points.x > points.y * points.z
Such conditions can then be used by select() to create Results objects. e.g.
select ( customers,
customers.name >= “a” &&
customers.name < “b” )
The select() function will not compile if the condition is invalid, which is a nice feature of template metaprogramming.
The most obvious way to use the results of a query would be to write something like
auto results = select( customers, customers.balance == 0 );
for(auto i = results.begin(); i!=results.end(); ++i)
{
display_customer(*i);
}
Unfortunately the data-type returned by relational::select() is very complex, and auto is not a standard C++ feature. Instead the relational::for_each() function is provided to visit the results.
template<typename Results, typename Visitor>
void for_each(const Results &results, Visitor &visitor)
The for_each() function iterates through all of the results and calls visitor with each row it finds in results. For example
struct display_customer
{
void operator()(const Customer & customer)
{
std::cout << “Name: “ << customer.name <<
“ Address: “ << customer.address <<
std::endl;
}
};
for_each(select( customers, customers.balance == 0),
display_customer() );
The relational::find_ex() function returns a reference to the first result,
const Customer &customer =
find_ex(select( customers, customers.balance == 0) );
which will throw an exception if no result is found. Alternatively relational::find() will return a pointer to the first result, or 0 if no results were found.
if(const Customer * customer =
find(select( customers, customers.balance == 0) ) )
{
// Do something with customer
}
else
{
// Report “not found”
}
The size() function returns the size of a given results set:
int debtors = select( customers, customers.balance<0 ).size();
size() iterates through all of the results to count the total, except for table::size() which is constant.
relational::limit() limits the number of results, for example to split results up into blocks.
template<typename Results>
Results2 limit( const Results &r,
unsigned count=1, unsigned offset=0 )
This function returns a Results object that contains the specified sub-range of results.
The function relational::output_results() writes the results to a stream, by default std::cout. This is useful for debugging queries.
relational::select() can also be passed a list of tables. In this case, the result is a “join” containing every combination of rows that match the condition.
The comma operator is used to create lists of tables. For example
animals
gives a list of length 1, while
animals, plants, minerals
gives a list of length 3. These lists can then be passed to the select() function. For example
select( (animals, plants), all() )
returns a Results object representing every (animal, plant) pair.
Conditions for multi-table queries work much the same as they do for single-table queries. The condition acts as a filter to return a subset of all possible combinations.
For example
animals.type == herbivore && animals.habitat == plants.habitat
means that only results are returned where the type of animal is a herbivore and the habitat of the plant equals the habitat of the animal.
As in the single-table case, the condition is only evaluated as the results are iterated. If the condition is invalid for the context in which it is used, the program will not compile. Do not use the same table type twice in the list of tables, otherwise the compiler will only use the first one.
relational::select() behaves as though each row (or combination of rows) is tested in sequence. The actual implementation can be much more efficient than that by the use of indexes. RML implements a simple query analyzer (in template meta-programming) that analyses the tables and the condition, and makes appropriate use of indexes.
When there is a choice of two indexes, the index used by the leftmost comparison is the index that is used.
RML cannot use indexes for operators || and !. Therefore a query such as
select( customers, customers.id == 12 || customers.id == 123)
will not use an index even when an index is available on id, and so the entire customers table is iterated.
In multi-table joins, the search sequence is always from left to right, so joined tables should be written in the sequence in which they are searched. For example
select( (customers, orders),
customers.name == “smith” &&
customers.id == orders.customer_id )
In this query, the search strategy is that first the name “smith” is searched in customers (using an index), and then for each matching customer, the orders are found whose customer_id matches customer.id. The search strategy would be entirely different if the query was written
select( (orders, customers),
customers.name == “smith” &&
customers.id == orders.customer_id )
In the second case, the entire orders table is iterated, and then for each customer of that order, the name is compared with “smith”. Generally the first strategy would be the more efficient.
RML’s query analyser is not able to guess which would be the more effective strategy; it is up to the programmer (you!) to decide on the join order.
When the results of a multi-table join are visited with relational::for_each(), a row from each joined table is passed to the visitor. The visitor functor should therefore provide an operator () that accepts N arguments, where N is the number of tables in the join.
For example
// Define the visitor
class add_to_window
{
CWindow &window;
public:
add_to_window(CWindow &win) : window(win) { }
void operator()(const Customer & customer,
const Order & order)
{
window.add_item (
customer.name,
order.date,
order.value);
}
};
// Issue a query and call the visitor
for_each(select( (customers, orders),
customers.id == orders.customer_id),
add_to_window(window));
The sum of a column can be found using the relational::sum() function. It takes two arguments: the column to sum, and the results to sum. For example
unsigned total_score = sum(papers.score,
select( papers, papers.student_id == 123) );
float owed = -sum( customers.balance, customers.balance<0);
Results can be sorted on an arbitrary column. relational::sort() takes two arguments: a column to sort on, the results to sort, and returns a new results set that is sorted on the specified column. For example
sort( customers.surname, customers )
sort( orders.date, select( (customers, orders),
orders.customer_id == customers.customer_id &&
orders.status == outstanding) )
relational::sort() will use an index where appropriate to sort tables more efficiently. Otherwise the results are collated and post-sorted (although only a vector of pointers is sorted). The function relational::sort_descending() is the same but sorts in reverse order.
Once a row is contained by relational::table, there is no way to modify it directly. All of the iterators return const references to the row to prevent it being modified. This is because the table relies on indexes that must be kept ordered, so changing the contents of a row directly is an extremely bad idea.
The standard containers (std::set etc) have a design flaw in that their indexes can be modified with impunity, resulting in undefined behaviour. It is much better to prevent modification of the key using const, rather than to ask the programmer nicely to not modify the key. In fact, the standard containers don’t even provide an efficient reindex, instead a call to erase() and a second call to insert() is required.
RML on the other hand provides a safe zero-overhead method to modify rows.
The main way to update a table is with the relational::table::update_where() function. This function updates every row in the table that matches the specified condition. For example
// Update a customer’s name to “Anna Carlson”, which has
// id 123
customers.update_where(customers.name = “Anna Carlson”,
customers.id == 123);
// Update a customer’s name and address
customers.update_where(
(customers.name = “Anna Carlson”) &&
(customers.address = “32 Stureplan”),
customers.id == 123);
// Clear all alerts
alerts.update_where( alerts.status=0, all() );
// Update all balances if they are bonus_eligible
customers.update_where(
customers.balance = customers.balance + 10,
customers.status == bonus_eligible);
Individual rows can be updated, providing that the row is actually contained in the given table.
customers.update_row<customer1.balance_column>(customer1, 100);
Notice something strange, which is generally hidden from the RML library user. The column is referred to by number, which is actually how RML operates internally. The RM_DEFINE_ROW_N macro defines the members XXX_column in the row, where XXX is the column name.
table::update_row() is extremely efficient. The template knows if the column is indexed, and if it is not, then the implementation simply updates the row. However, if the column is indexed, then the node that the row belongs in is moved within the index. A duplicate_key exception can be thrown at this point, in which case the row is unchanged.
If you already have a reference to a row, table::update_row() is more efficient than table::update_where().
Care must be taken to not interleave updates with queries, or the results could be unpredictable.
A particular row can be removed from a table by calling the table::erase_row() function with the particular row. The given row must be a member the table. However table::erase_row() will invalidate any query using the table.
A much safer way to erase rows is to use the function table::erase_where(). This will erase all rows in a table matching a particular condition. For example
customers.erase_where(customers.id == 123);
accounts.erase_where(
accounts.status == closed &&
accounts.close_date < “2002-01-01”);
To erase an entire table, call
mytable.erase(all());
or just
mytable.erase();
which is actually slightly more efficient. (Another way to erase a table is to swap it with an empty table.)
The << and >> operators have been overloaded to load and save tables. To load or save the entire state of an application, the tables can simply be read or written to a file stream. This is much easier than writing serialization routines.
// Save tables
file << customers << accounts << orders;
// Load tables
file >> customers >> accounts >> orders;
If the compiler complains about wide strings, #define RM_DISABLE_WCHAR.
RML tables can store C++ objects, not just rows. This allows for a more object-oriented style of programming, and allows indexes and queries on collections of objects, not just collections of rows.
When defining objects to be stored in tables, RML stores a pointer to the table within each object. Although this is a small overhead, it allows objects to modify their own members via the table. Because the object is const, it prevents the programmer from modifying the object illegally, without resorting to const_cast, which should be sufficient warning to the programmer that he/she is doing something dangerous. RML’s modification mechanism is more efficient than the STL because it has the facility to reposition an object in an index.
This is provided by the relational::table_row template. It takes two parameters: the derived class and the base class. The base class must be defined using RM_DEFINE_ROW_N, in order to give it the necessary machinery to allow it to be stored in a table.
RM_DEFINE_ROW_3(ProductBase,
unique<int>, id,
unique<int>, code,
indexed<std::string>, description)
class Product : public table_row<Product, ProductBase>
{
public:
Product(int id0, int code0, const std::string &desc0)
{
id = id0;
code = code0;
description = desc0;
}
const std::string &getDescription() const
{
return description;
}
void setDescription(const std::string &desc) const
{
update_column<description_column>(desc);
// Not: description = desc;
}
};
Note that setDescription() is const. Although this looks odd, it is the only safe way to prevent the programmer from accidentally modifying the object and thereby corrupting the indexes.
The template relational::table_row provides an update_column method that updates the specified column, which also repositions the object in the table if the given column is indexed. There is no overhead in this function call for an unindexed column.
For example the implementation of Product::setDescription() uses table_row::update_column(), which calls table::update_row() to modify the index in the table as well as the contents of the row.
A table of Products is declared in the same way as any other table:
typedef table<Product> Products;