Persist

A memory allocator for high-performance persistent data storage in C++.

Introduction

Persist is a C++ memory allocator for mapped memory.  It maps a region of memory to a file.  Any data structures created in the mapped memory region will be persistent between invocations of the program.  The programmer can completely forget about loading or saving data to various places like files, a registry, environment variables, or a database.  The performance is as good as normal memory, see benchmarks.

Persist's memory maps can be shared between processes on the same machine.  This allows several programs to communicate and access the memory file simultaneously.  NPTL is required to do this on Linux, since linuxthreads does not provide mutual exclusion between processes.

Persist is fully compatible with the C++ STL, providing persistent containers.

Limitations

Although Persist is fast, it offers no safeguards on its contents.  If the application trashes it, the data is lost.  The data is also lost if the data structures change, introducing an incompatibility between the old data file.  Data is stored in binary format (as it would be in memory), so the data stored is not portable.

The system relies on being able to map the same address in virtual memory.  If the application is unable to map the file to the same location, the file is unreadable.  Don't use memory maps to store important data.

Persist suffers from the limitations of a 32-bit address space.  You cannot map a file greater than 4GB.  In practice, that is reduced further to around 2GB due to operating-system limitations.

Creating persistent data

persist::map_data<T>

Normally all access to persistent data is done through the persist::map_data templated class.  It provides a type-safe wrapper around raw bytes stored in the file.  For example

#include <persist_stl.h>
#include <iostream>

using namespace persist;

class AppData
{
public:
    struct Email
    {
        string address, public_key;
        bool encrypt;   // This person prefers encrypted email
        bool plaintext; // This person prefers plaintext
    };

    struct WebPage
    {
        string url, title;
    };

    int win_x, win_y, win_height, win_width;    // Where the window was last placed

    map<string, Email> contacts;    // List of email contacts
    vector<WebPage> favourites;     // List of favourite web-pages
};
int main(int argc, char*argv[])
{    
    map_data<AppData> appdata("browser.map");

    if(appdata)     // Checks that the data was initialized properly
    {
        // Add a dummy contact

        AppData::Email mike;
        mike.address = "mike@yahoo.com";
        mike.encrypt = false;
        mike.plaintext = false;

        appdata->contacts["Mike Smith"] = mike;

        std::cout << "Contact added\n";
        return 0;
    }
    else
    {
        std::cout << "Could not initialize map\n";
        return 1;
    }
}

Normally an application would have just one persistent structure, containing all the persistent data for the application.  The constructor of the object is only called the first time the map is created.

map_data(const char *filename, size_t size, int flags, size_t map_address)

Creates or opens an existing memory-mapped file.

filename  The name of the file containing the data.
size
  The initial size of the file.  This size may be increased if auto_grow is specified in flags (default).
flags
  Flags specifying how the memory is mapped.  These options are combined using |

persist::auto_grow    File grows automatically when it gets full.
persist::private_map    Any data written to the file is "private" and will not be shared or written to file.

map_address  Specifies where in the address space the memory should be mapped.  See mmap() for more details.

operator bool() const

Returns true if the map_data object was initialized successfully.  An application should test the map_data<T> object to see whether the object was initialized correctly.

void open(const char *filename, size_t size, int flags, size_t map_address)

Opens or creates the given map file.

void close()

Closes the given map file.

T *operator ->()

Accesses members of the persistent object.

T &operator*()

Returns a reference to the persistent object.

void lock()

Don't use this, use persist::lock instead.  Locks the data structure for thread-safe, and process-safe access to the data.  This provides an all-process-wide mutex (in Win32 and NTPL), or just a process-wide mutex (linuxthreads).  Therefore if you only have linuxthreads, you cannot safely share memory between processes.

void unlock()

Unlocks the mutex locked by lock().

void *malloc(size_t size)

A programmer would not normally call this function directly.  C++ programmers should prefer type-safe equivalents, such as the STL, persist::allocator<T>, operator new, or persist::owner<T>.

Allocates size bytes of persistent data from the map.  If the map was invalid, or for some reason the map could not be extended, this function returns null.  Otherwise, it returns a pointer to the memory allocated.

void free(void *, size_t)

Frees (recycles) memory previously allocated by malloc(). This memory can then be reused, otherwise the map file will just grow and grow.

persist::owner<T>

A persist::owner<T> is just like a std::auto_ptr<T>, but is less broken and can safely be used in the STL.  It always deletes the object from the persistent heap when the owner goes out of scope.

create(), create(const T&)

Creates an object.

destroy()

Destroys an object.

persist::allocator<T>

An allocator compatible with the STL allocator std::allocator<T>.  This allocator allocates data on the persistent heap.  Note the common STL containers map, multimap, set, multiset, hash_map, hash_set, hash_multimap, hash_multiset, vector, list, basic_string, string, wstring have all been defined in the persist namespace to use persist::allocator.

persist::map, persist::multimap, persist::set, persist::multiset, persist::hash_map, persist::hash_multimap, persist::hash_set, persist::hash_multiset, persist::vector, persist::list, persist::basic_string, persist::string, persist::wstring

Theses are the standard STL containers, but the allocator has been replaced by persist::allocator so they can safely be used in persistent memory.  The default allocator will not work.

persist::lock

This object defines a critical region.  It locks the shared data so that only one thread or process can access it.  The shared data remains locked for the scope of this object.  Example:

class Root
{
public:
    int number;

    Root() : number(0) { }

    int get_number()
    {
        lock l;
        return number++;    // Increment safe for concurrent access
    }
};

persist::fixed_string<N,C>

fixed_string is a fixed-length string.

It also stores a 1-byte length field, so the length may not exceed 255 characters, and they are zero-terminated.  The overhead per string is just two bytes, so fixed_strings can offer a more compact string storage than basic_string.  It also supports common operators like assignment and comparisons, without resorting to strcmp(), strncpy() or strlen().

operator new(size_t, persist::map_file)

This is used internally.  If you don't fancy using persist::owner, or persist::allocator, you can write

X *x = new(appdata) X();

to create a new object x in persistent memory.  Generally, C++ programmers should prefer to use the STL.

What can be stored in persistent memory?

Almost all normal data types can be stored, including built-in data types (ints, floats, doubles, enums etc), structures and arrays.  Pointers may also be stored, but they will only be valid if they point to data in the persistent heap.  If they point to data outside the heap, guess what?  They will probably be broken the next time the program is run.

Objects created using the STL are also perfectly valid, provided they use the allocator persist::allocator.  For example,

#include "persist_stl.h"

struct AppData
{
    std::vector<int> primes1;      // Invalid, since the vector is not stored in persistent memory
    std::vector<int, persist::allocator<int> > primes2;   // Okay
    persist::vector<int> primes3;  // Okay

    std::string name1;             // Invalid, since the string body is not stored in persistent memory
    persist::string name2;         // Okay
};

persist::map_data<AppData> app_data("appdata.map");

Most of the common containers have been defined in <persist_stl.h> to use persist::allocator.

Using multiple maps

Multiple maps sound like a great idea from transferring data from one map to another, like files.  But there are problems.  Maps cannot overlap one another in memory, and must be created with an explicit "map_address" (see map_data::open()) to prevent this.  The auto_grow facility might result in a map growing too much that it needs relocating.  Finally there is no guarantee that the OS is able to map data into your address space.  Perhaps they address space is already by something else like a thread or virtual memory?

persist::allocator allocates data on the "current" map.  The map_data<T>::select() method selects which map to write to.

Troubleshooting

Failed to create map

This is indicated by an invalid map address.  Try deleting the map and trying again.  Perhaps the filename you specified was invalid.  Perhaps the map could not be loaded to the same location that it was created in.  This is a particular hazard of maps that grow automatically.

Segmentation fault

The bane of every C programmer.  The first thing to do is delete the memory file, which could have gotten corrupted.

You might have stored a pointer in your file.  Memory pointers to objects outside the persistent storage area will not work.  You might have corrupted your file in some mysterious way.

You might have changed your data structures.  This will completely invalidate all your stored data - that's the deal.  Delete the map file and start afresh.

You might have used an ordinary STL container instead of a persistent container defined in <persist_stl.h>.  Ordinary STL containers do not work, since they sometimes allocate objects, and don't know to allocate data in the persistent storage.  The persist containers are defined with a special allocator so that this works.

Something significant might have happened to your system to change it.  Delete the map file and start again.  Your persistent file may have been created on another machine.  This may have different storage characteristics.

Two processes may be accessing the data at the same time.  You need a mutex.  Use persist::lock.

You might be have used the same the same filename for two different applications, with different root types.

Memory leaks

After prolonged use, does the memory file just get larger and larger?  This could due to a memory leak.  Basically you have created an object using new or malloc, then forgot about its pointer.  Or one of your containers gets more and more data inserted into it.

A memory file only grows, it does not shrink.  When you empty the contents of a memory file, the disk space it uses is still reserved for future use.

Examples

Command line history

// cmdline.cpp
// Stores a vector of strings (the command line) and redisplays them

#include <iostream>
#include <persist_stl.h>
using namespace persist;

class History
{
public:
    vector<string> commands;   // This is persist::vector<persist::string>
};

using namespace std;


int main(int argc, char*argv[])
{
    // Declare the map: A History object is stored in the file "history.map"
    map_data<History> history("history.map");

    if(history)     // Check that the map was successfully created
    {
        try
        {
            if(argc==2 && strcmp(argv[1], "erase")==0)
            {
                // Erase the history
                history->commands.clear();
            }
            else
            {
                // Add the current command line to the history

                std::string line = argv[0];
                for(int i=1; i<argc; ++i)
                    line += " ", line += argv[i];

                history->commands.push_back(line);
            }

            // Display the history

            for(int l=0; l<history->commands.size(); ++l)
                cout << history->commands[l] << endl;
        }
        catch(bad_alloc)
        {
            cout << "Memory allocation failed\n";
            return 2;
        }
    }
    else
    {
        cout << "Could not create map\n";
        return 1;
    }

    return 0;
}